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\Q (54) Title: COMPLEXITY MANAGEMENT OF GENOMIC DNA BY LOCUS SPECIRC AMPLICATION 
SO 

^ (57) Abstract: The present invention provides for novel methods and kits for reducing the complexity of a nucleic acid sample 
^ to interrogate a collection of taiget sequences. In one embodiment complexity reduction can be accomplished by extension of a 
2 ^ocus specific capture probe followed by amplification of the extended capture probe using common primers. The locus specific 

capture probes may be attached to a solid support. Multiple DNA sequences may be amplified simultaneously to produce a reduced 
Q complexity sample. The invention further provides for analysis of the above sample to interrogate sequences of interest such as 

polymorphisms. The amplified sample may be hybridized to an array, which may be specifically designed to interrogate the desired 
^ fragments for the presence or absence of a polymoiphisnu 
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COMPLEXITY MANAGEMENT OF GENOMIC DNA BY LOCUS SPECIFIC 

AMPLIFICATION 



RELATED APPUC ATIONS 
5 This application is a continuation of U.S. AppKcation No. 10/272,155, filed 

October 14, 2002, and claims the benefit of U.S. AppUcation No. 60/389,747, filed June 
17, 2002. The enture teachings of the above applications are incorporated herein by 
reference. 

10 BACKGROUND OF THE INVENTION 

The invention relates to enrichment and amplification of a collection of target 
sequences from a nucleic acid sample and methods of analyzing amplified product In 
some embodiments target sequences are amplified by extension of a locus-specific 
primer followed by amplification of the extended locus-specific primer with a generic 

15 pair of primers. In some embodiments the locus-specific primers are attached to a solid 
support and extension takes place on the solid support. In some embodiments the 
invention relates to the preparation of target for array based analysis of genotype. The 
present invention relates to the fields of molecular biology and genetics. 

The past years have seen a dynamic change in the ability of science to 

20 comprehend vast amounts of data. Pioneering technologies such as nucleic acid arrays 
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allow scientists to delve into the world of genetics in far greater detail tiian ever before. 
Exploration of genomic DNA has long been a dream of the scientific community. Held 
within the complex structures of genomic DNA lies the potential to identify, diagnose, 
or treat diseases like cancer, Alzheimer disease or alcoholism. Exploitation of genomic 

5 information from plants and animals may also provide answers to the world's food 
distribution problems. 

Recent ejBForts in the scientific community, such as the publication of the draft 
sequence of the himian genome in February 2001, have changed the dream of genome 
exploration into a reality. Genome-wide assays, however, must contend with flie 

10 complexity of genomes; Ihe human genome for example is estimated to have a 
complexity of 3x10^ base pairs. Novel methods of sample preparation and sample 
analysis that reduce complexity may provide for the fast and cost eflfective exploration 
of complex samples of nucleic acids, particularly genomic DNA. 

Single nucleotide polymorphisms (SNPs) have emerged as the marker of choice 

15 for genome wide association studies and genetic linkage studies. Building SNP maps of 
the genome will provide the fi:mnework for new studies to identify, the underlying 
genetic basis of complex diseases such as cancer, mental illness and diabetes. Due to 
the wide rangmg applications of SNPs there is still a need for the development of 
robust, flexible, cost-effective technology platforms that allow for scoring genotypes in 

20 large numbers of samples. 

SUMMARY OF THE INVENTION 

The present invention provides for novel methods of sample preparation and 
analysis comprising managing or reducing the complexity of a nucleic acid sample by 
25 amplification of a collection of target sequences using target specific capture probes. In 
some embodiments tiie extended capture probes are attached to a solid support; in some 
embodiments the extended capture probes are in solution. In some embodiments the 
amplified collection of target sequences is analyzed by hybridization to an array that is 
designed to interrogate sequence variation in the target sequences. In some 
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embodiments the amplified collection of target sequences is analyzed by hybridization 
to an array of tag probes. 

In one embodiment a method of amplifying a collection of target sequences 
from a nucleic acid sample is disclosed. A collection of capture probes is generated. 

5 The collection comprised a plurality of different species of primers wherein each 

species comprises a first common sequence and a 3' variable region that is specific for a 
target sequence in the collection of target sequences. Each target sequence is 
represented by at least one species of primer which hybridizes to the target sequence 
and the collection of capture probes is attached to a solid support so that the 3' end of 

10 the capture probes is available for extension. The nucleic add sample is fi:agmented 
and an adapter that has a second common sequence is Ugated to the firagments. 
Fragmentation in some embodiments is by one or more restriction enzymes. The 
adapter-ligated firagments are hybridized to flie collection of capture probes and the 
capture probes are extended using the hybridized ad^ter-ligated fragments as template 

15 for extension and thereby incorporating the target sequence and the second common 
sequence into the 3' end of the extended capture probe. The extended capture probes 
are then amplified using first and second common sequence primers. 

In some embodiments the capture probes are attached to the solid support 
through a covalent interaction. In another embodiment there is a tag sequence in the 

20 capture probes that is unique for each species of capture probe and the capture probes 
are attached to tiae solid support by hybridization to a collection of tag probes that are 
covalently attached to the solid support. In some embodiments each species of capture 
probe is attached to the solid support in a discrete location. 

In another embodiment the extended c^ture probes are released from the solid 

25 support prior to amplification. Prior to releasing the extended captxure probes from the 
solid support nucleic acids that are not covalently attached to the solid support may be 
removed. 

In another embodiment the extended capture probes are enriched prior to 
ampUfication. In some embodiments capture probes are enriched by mcorporation of 
30 labeled nucleotides into the extended capture probes followed by isolation of labeled 
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capture probes by aflSnity chromatography. In some embodiments capture probes are 
labeled with biotin and avidin, streptavidin or au anti-biotin antibody, which may be 
monoclonal, maybe used to isolate extended capture probes. In another embodiment 
extended capture probes are made double stranded and single stranded nucleic acid in 

5 the sample is digested by, for example a nuclease, such as, for example Exonuclease L 
In another embodiment the extended capture probes are circularized prior to 
amplification and uncircularized nucleic acid in the sample is digested by, for example, 
a nuclease, such as, for example, Exonuclease m. In some embodiments the extended 
capture probes are circularized by hybridizing an oligonucleotide splint to the extended 

10 capture probes so that the 5' and 3' ends of extended capture probes are juxtaposed and 
then ligatmg the ends of flie extended capture probes. 

In one embodiment a method of genotyping one or more polymorphic locations 
m a sample is disclosed. An amplified collection of target sequences from the sample is 
prepared and hybridized to an array designed to iuterrogate at least one polymorphic 

15 location in the collection of target sequences. The hybridization pattem is analyzed to 
determine the identity of the allele or alleles present at one or more polymorphic 
location in the collection of target sequences. 

In another embodiment a method for analyzing sequence variations in a 
population of individuals is disclosed. A nucleic acid sample is obtained from each 

20 individual and a collection of target sequences from each nucleic acid sample is 
amplified. Each amplified collection of target sequences is hybridized to an array 
designed to interrogate sequOTce variation in the collection of target sequences to 
generate a hybridization pattem for each sample and the hybridization patterns are 
analyzed or compared to detemiine the presence or absence of sequence variation in the 

25 population of individuals. 

In another embodiment a method of amplifying a collection of target sequences 
from a nucleic acid sanq>le in solution is disclosed. A collection of capture probes is 
generated. The collection comprised a plurality of different species of primers wherein 
each species comprises a first common sequence and a 3' variable region that is specific 

30 for a target sequence wherein each target sequence in a collection of target sequences is 
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represented by at least one species of primer which hybridizes to the target sequence. 
The nucleic acid sample is fragmented aad an adapter is ligated to the fragments so that 
the strand that is ligated to the 5' end of the fragment strands comprises a second 
common sequence and the strand that is ligated to the 3' end of the fragments lacks the 

5 second coirmon sequence and is blocked from extension at the 3 ' end. The adapter- 
ligated fragments are hybridized to the collection of capture probes and the capture 
probes are extended using the hybridized adapter-ligated fragments as template for 
extension and thereby incorporating the target sequence and the complement of the 
second common sequence into the extended c^ture probes. The extended capture 

10 probes are then amplified with first and second common sequence primers. 

In one embodiment an amino group is used to block extension at flie 3' end of 
the ad^ter strand. 

In some embodiments fragmentation of the nucleic acid sample is by digestion 
with one or more restriction enzymes. 

15 In another embodiment a method for genotyping one or more polymorphisms in 

a nucleic acid sample is disclosed. The nucleic acid sample is fragmented and an 
adaptor comprising a first conraion priming sequence is ligated to the firagments. A 
collection of capture probes is ligated to the fragments. The capture probes have a 
second common priming sequence, a tag sequence unique for each species of capture 

20 probe, a first locus specific sequence, a Type lis restriction enzyme recognition 
sequence, and a second locus specific sequence. The Type lis restriction enzyme 
recognition sequence is positioned so that the enzyme will cut immediately 5* of the 
polymorphic base in a target sequence. The capture probes are extended to generate 
single-stranded extension products and then amplified using the first and second 

25 coiomon sequence primers. The the amplified product is digested with a Type lis 

restriction enzyme and the firagments are extended in the presence of one or more type 
of labeled ddNTP. In one embodiment the extension is done is four separate reactions, 
one for each ddNTP and the ddNTPs may be labeled with the same label The extended 
fragments are then hybridized to four separate arrays. In another embodiment the 

30 ddNTPs are differentially labeled with at least two different labels and the extension 
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reactions may be done in less than four reactions and each reaction may be hybridized 
to a separate array. The arrays are arrays of tag probes that hybridize to the tag 
sequences in the capture probes. The hybridization pattern on each of the arrays is 
analyzed to determine at least one genotype. 

5 In some embodiments the ddNTPs are labeled with biotiiL 

In another embodiment one of the common sequence primers is resistant to 
nuclease digestion and the sample is treated with a nuclease that cleaves 5' to 3'after the 
fragments are extended in the presence of labeled ddNTP- hi one embodiment the 
primer is resistant to nuclease digestion because it contains phosphoiothioate hnkages. 

10 In some embodiments the nuclease is T7 Gene 6 Exonuclease. 

hi another embodiment a method for screening for sequence variations in a 
population of individuals is disclosed. A nucleic acid sample from each individual is 
provided and the sample is amplified and genotyped by one of flie method of the 
invention and the genotypes from the samples are compared to detennine the presence 

1 5 or absence of sequence variation in the population of individuals, 

hi another embodiment a kit for amplifying a collection of target sequences is 
disclosed. The kit has a collection of capture probes that is specific for a collection of 
target sequences and has a first common sequence that is common to all of the capture 
probes, an adapter that has a second common sequence; and a pair of first and second 

20 common sequence primors. In anottier embodiment the collection of capture probes in 
the kit is covalently attached to a soUd support so that the 3' end of the capture probes is 
available for extension. In another embodiment the kit also provides a restriction 
enzyme, buffer, DNA polymerase and dNTPs. hi some embodiments the restriction 
enzyme is a Type lis restriction enzyme. In another embodiment the kit also contains a 

25 Ugase, dNTPs, ddNTPs, buflfer and DNA polymerase, hi some embodiments one of the 
common sequence primers is resistant to nuclease digestion. 

hi another embodiment the capture probes also have a tag sequence unique for 
each species of capture probe and a Type lis restriction enzyme recognition sequence. 
In another embodiment the adapter has a first strand comprising a common sequence 

30 and a second strand that does nof contain the complement of that common sequence and 
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the second strand is blocked from extension at the 3' end by, for example, an amino 
group. 

In another embodiment a collection of capture probes attached to a solid support 
is disclosed. The solid support maybe arrays, beads, microparticles, microtitre dishes or 
S gels. 

In another embodiment aplurality of ohgonucleotides attached to a solid support 
is disclosed. The solid support may be arrays, beads, microparticles, microtitre dishes or 
gels. The oligonucleotides may be released and used for a variety of analysis. The 
plurality of oligonucleotides may comprise a collection of capture probes. 

10 

BRIEF DESCRIPTION OF THE DRAWINGS 

The foregoing and other objects, features and advantages of the invention will 
be 25)parent from the following more particular description of preferred embodiments of. 
the invention, as illustrated in the accompanying drawmgs in which like reference 
15 characters refer to the same parts throughout the different views. The drawings are not 
necessarily to scale, emphasis instead being placed upon illustrating the principles of 
the inventiorL 

Figure 1 shows a method of amplifymg specific target sequences using a capture 
probe that is locus specific and genomic DNA that has been ligated to an ad^ter. The 
20 capture probes are attached to a solid support and extended to incorporate the sequence 
of interest and the adapter sequence. The extended capture probes are released from the 
soUd support and amplified with a single primer pair. 

Figure 2 shows a method where the capture probes are attached to a solid 
support by hybridization to a probe that is covalently attached to the solid support. The 
25 probes on ttie array are complementary to a tag sequence in the 5' region of the capture 
probe. The capture probe hybridizes so that the 3' end is available for extension. 

Figure 3 shows a schematic of solution-based multiplexed SNP genotyping. A 
sample is fragmented and ligated to an adaptor so that the adaptor sequence that 
hybridizes to the 3' end of the strands of the fragments is blocked from extension. 
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Locus specific capture probes are hybridized to the fi-agtneats and extended in solution 
then amplified by PGR using primers to Al and A2. Prior to amplification the extended 
capture probes may be emiched by, for example, removal of non-extended products or 
by positive selection of extended products. 

5 Figures 4A-4B show a method of multiplexed anchored runoff ampUfication 

wherein the alleles present at different polymorphic positions are analyzed by 
hybridization to an array of tag probes. The capture probe includes a recognition site 
for a Type Us restriction enzyme so that the enzyme cuts immediately upstream of the 
polymorphic locus. The capture probe is extended by one labeled nucleotide and the 

10 identity of the nucleotide is determined by hybridization to an array or probes that are 
complementary to the tag sequences in the capture probes. 

Figure 5 shows an enrichment scheme. Biotin is incorporated mto the extended 
capture probes and biotin labeled extended capture probes are selected by affinity 
chromatography. 

1 5 Figure 6 shows another enrichment scheme using nuclease that is specific for 

smgle stranded nucleic acid. Capture probes that are fiiUy extended through the adapter 
site on the genomic DNA firagment are converted to double stranded DNA by annealing 
and extension of a primer that hybridizes to the adapter sequence. 

Figure 7 shows another enrichment scheme. The ends of the extended capture 

20 probes are ligated together to form a circle using a splint oligonucleotide that is 
complementary to the primer sites at the ends of the extended capture probes. The 
sample is digested with an exonuclease so circularized sequences are protected Scorn 
digestion. 



25 DETAIIJEDDESCJUFnON OF THE INVENTION 

A description of preferred embodiments of the invention follows. 
(A) General 

The present invention has many preferred embodiments and relies on many 
patents, applications and other references for details known to those of ttie art. 
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Therefore, when a patent, application, or other reference is cited or repeated below, it 
should be understood that it is incorporated by reference in its entirety for all purposes 
as well as for the proposition that is recited. 

As used in this appUcation, the singular form "a," "an," and ^*the" mclude plural 
5 references unless the context clearly dictates otherwise. For example, the term "an 
^ent" includes a plurality of agents, including mixtures thereof 

An individual is not limited to a human being but may also be other organisms 
including but not limited to mammals, plants, bacteria, or cells derived from any of the 
above. 

10 Throughout this disclosure, various aspects of this invention can be presented in 

a range format. It should be understood that the description in range format is merely 
for convenience and brevity and should not be construed as an inflexible limitation on 
the scope of the invention. Accordingly, the description of a range diould be 
considered to have specifically disclosed all the possible sub-ranges as well as 

15 individual numerical values within that range. For example, description of a range such 
as from 1 to 6 should be considered to have specifically disclosed sub-ranges such as 
fromlto3, froml to4,from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as 
individual numbers within that range, for example, 1, 2, 3, 4, 5, and 6. The same holds 
true for ranges in increments of 10^ 10^ 10^ 10^ 10, 10"^ 10"^ 10"^ 10^. or lO^^ for 

20 example. This applies regardless of the breadth of the range. 

The practice of the present invention may employ, unless otherwise mdicated, 
conventional techniques and descriptions of organic chraiistry, polymer technology, 
molecular biology (including recombinant techniques), cell biology, biochemistry, and 
immunology, which are within the skill of the art. Such conventional techniques 

25 include polymer array synfliesis, hybridization, ligation, and detection of hybridization 
using a label. Specific illustrations of suitable techniques can be had by reference to the 
example herein below. However, other equivalent conventional procedures can, of 
course, also be used. Such conventional techniques and descriptions can be found in 
standard laboratory manuals such as Genome Analysis: A Laboratory Manual Series 

30 (Vols. MV)y Using Antibodies: A Laboratory Manual, Cells: A Laboratory Manual, 
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PCR Primer: A Laboratory Manual, and Molecular Cloning: A Laboratory Manual (all 
from Cold Spring Harbor Laboratory Press), Stryer (anyone have the cite). Gait, 
"Oligonucleotide Synthesis: A Practical Approach" 1984, IRL Press, London, Nelson 
and Cox (2000), Lehninger, Principles of Biochemistry 3"^ Ed., W.H. Freeman Pub., 
5 New York, NY and Berg et al, (2002) Biochemistry, 5^^ Ed., W.H. Freeman Pub., New 
York, NYall of which are herein incorporated in their entirety by reference for all 
purposes. 

The present invention can employ soUd substrates, including arrays in some 
preferred embodiments. Methods and techniques appUcable to polymer (including 

10 protein) array synthesis have been described in U.S.S.N 09/536,841, WO 00/58516, 
U.S. Patents Nos. 5,143,854, 5,242,974, 5,252,743, 5,324,633, 5,384,261, 5,424,186, 
5,451,683, 5,482,867, 5,491,074, 5,527,681, 5,550,215, 5,571,639, 5,578,832, 
5,593,839, 5,599,695, 5,624,711, 5,631,734, 5,795,716, 5,831,070, 5,837,832, 
5,856,101, 5,858,659, 5,936,324, 5,968,740, 5,974,164, 5,981,185, 5,981,956, . 

15 6,025,601, 6,033,860, 6,040,193, 6,090,555, and 6,136,269, in POT AppKcations Nos. 
PCT/US99/00730 (Intemational Publication Number WO 99/36760) and PCT/US 
01/04285, and in U.S. Patent Applications Serial Nos. 09/501,099 and 09/122,216 
which are all incorporated herein by reference in their entirety for all purposes. 

Patents that describe synthesis techniques in specific embodiments include U.S. 

20 Patents Nos. 5,412,087, 6,147,205, 6,262,216, 6,310,189, 5,889,165 and 5,959,098 
which are each incorporated herein by reference in their entirety for all purposes. 
Nucleic acid arrays are described in many of the above patents, but the same techniques 
are applied to polypeptide arrays. 

The present invention also contemplates many uses for polymers attached to 

25 solid substrates. These uses include gme expression monitoring, profiling, library 
screening, genotyping, and diagnostics. Gene expression monitoring and profiling 
metiiods can be shown in U.S. Patents Nos. 5,800,992, 6,013,449, 6,020,135, 6,033,860, 
6,040,138, 6,177,248 and 6,309,822. Genotyping and uses therefore are shown in 
USSN 10/013,598, and U.S. Patents Nos. 5,856,092, 6,300,063, 5,858,659, 6,284,460, 

30 6,361,947, 6,368,799 and 6,333,179 which are each incorporated herein by reference. 
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Other uses are embodied in U.S. Patents Nos. 5,871,928, 5,902,723, 6,045,996, 
5,541,061, and 6,197,506 which are incorporated herein by reference. 

The present invention also contemplates sample preparation methods in certain 
preferred embodiments. For example, see the patents in the gene expression, profiling, 

5 genotyping and other use patents above, as well as USSN 09/854,3 1 7, U.S. Patent Nos. 

5,437,990, 5,215,899, 5,466,586, 4,357,421, and Gubler et al., 1985, Biochemica et 
i Biophysica Acta, Displacement Synthesis of Globin Complementary DNA: Evidence 
for Sequence Amplification. 

Prior to or concurrent with analysis, the nucleic acid sample may be amplified 

10 by a variety of mechanisms, some of which may employ PGR. See, e.g., PCR 
Technology: Principles and Applications for DNA Amplification (Ed. HA. Erlich, 
Freeman Press, NY, NY, 1992); PCR Protocols: A Guide to Methods and Applications 
(Eds. Ihnis, et al.. Academic Press, San Diego, CA, 1990); Mattila et al.. Nucleic Acids 
Res. 19, 4967 (1991); Eckert et al., PCR Methods and Applications 1, 17 (1991); PCR 

15 (Eds. McPherson et al., IRL Press, Oxford); and U.S. Patent Nos. 4,683,202, 4,683,195, 
4,800,159 4,965,188,and 5,333,675, each of which is incorporated herein by reference 
in their entireties for all purposes. The sample may be amplified on the array. See, for 
example, U.S Patent No 6,300,070 and U.S. patent application 09/513,300, which are 
incorporated herein by reference. 

20 Other suitable amplification methods include the Ugase chain reaction (LCR) 

(e.g.. Wu and Wallace, Genomics 4, 560 (1989), Landegren et al.. Science 241, 1077 
(1988) and Bairinger et al. Gene 89:117 (1990)), transcription amplification (Kwoh et 
al., Proc. Natl Acad Sci. USA 86, 1173 (1989) and WO88/10315), self-sustained 
sequence repUcation (GuateUi et al., Proc. Nat. Acad. Sci. USA. 87, 1874 (1990), 

25 WO/88/10315 and WO90/06995), selective amplification of target polynucleotide 
sequences (U.S. Patent No 6,410,276), consensus sequence primed polymerase chain 
reaction (CP-PCR) (U.S. Patent No 4,437,975), arbitrarily primed polymerase chain 
reaction (AP-PCR) (U.S. Patent No 5, 413,909, 5,861,245) and nucleic acid based 
sequence amplification (NABSA). (See, US patents nos. 5,409,818, 5,554,517, and 

30 6,063,603, each of which is incorporated herein by reference). Other amplification 
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methods that may be used are described in, U.S. Patent Nos. 5,242,794, 5,494,810, 
4,988,617 and inUSSN 09/854,317, each of which is incorporated herein by reference. 

Additional methods of sample preparation and techniques for reducing the 
complexity of a nucleic sample are described in Dong et al., Genome Research 1 1, 1418 

5 (2001), in U.S. Patent No 6,361,947, 6,391,592 and U.S. Patent appUcationNos. 
09/512,300, 09/916,135, 09/920,491, 09/910,292, and 10/013,598, which are 
incorporated herein by reference in their entireties. 

The present invention also contemplates detection of hybridization between 
ligands in certain preferred embodiments. See U.S. Pat Nos. 5,143,854, 5,578,832; 

10 5,631,734; 5,834,758; 5,936,324; 5,981,956; 6,025,601; 6,141,096; 6,185,030; 
6,201,639; 6,218,803; and 6,225,625 and m PCT AppUcation PCT/US99/ 06097 
(published as W099/47964), each of which also is hereby incorporated by referrace in 
its entirety for all purposes. 

The practice of the present invention may also employ conventional biology 

1 5 methods, software and systems. Computer software products of the invention typically 
include computer readable medium having computer-executable instructions for 
performing the logic steps of the method of the invention. Suitable computer readable 
medium include floppy disk, CD-ROM/DVD/DVD-ROM, hard-disk drive, flash 
memory, ROM/RAM, magnetic tapes and etc. The computer executable instructions 

20 may be written in a suitable computer language or combination of several languages. 
Basic computational biology methods are described in, e.g. Setubal and Meidanis et al.. 
Introduction to Computational Biology Methods (PWS Publishing Company, Boston, 
1997); Salzberg, Searles, Kasif, CEd.), Computational Methods in Molecular Biology^ 
(Elsevier, Amsterdam, 1998); Rashidi and Buehler, Bioinformatics Basics: Application 

25 in Biological Science and Medicine (CRC Press, London, 2000) and Ouelette and 

Bzevanis Bioinformatics: A Practical Guide for Analysis of Gene and Proteins (Wiley 
& Sons, Inc., 2"^ed., 2001). 

The present invention may also make use of various computer program products 
and software for a variety of purposes, such as probe design, management of data, 

30 analysis, and instrument operation. See, U.S. Pat. Nos. 5,593,839, 5,795,716, 
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5,733,729, 5,974,164, 6,066,454, 6,090,555, 6,185,561, 6,188,783, 6,223,127, 
6,229,911 and 6,308,170. 

Additionally, the present invention may have preferred embodiments that 
include methods for providing genetic information over the intemet. See U.S. patent 
appUcations and provisional applications 10/063,559, 60/349,546, 60/376,003, 
60/394,574, and 60/403,381 

The present invention provides a flexible and scalable method for analyzing 
coiiq)lex samples of nucleic acids, such as genomic DNA. These methods are not 
limited to any particular type of nucleic acid sample: plant, bacterial, animal (iacluding 
human) total genome DNA, RNA, cDNA and the like maybe analyzed using some or 
all of the methods disclosed in this invention. The word "DNA" maybe used belo^r as 
an example of a nucleic acid. It is understood that this term mcludes all nucleic acids, 
such as DNA and RNA, unless a use below requires a specific type of nucleic acid. 
This invention provides a powerful tool for analysis of complex nucleic acid samples. 
From experimental design to isolation of desired firagments and hybridization to an 
appropriate array, the invention provides for fast, efficient and inexpensive metiiods of 
complex nucleic acid analysis. 

(B) Definitions 

Nucleic acids according to the present invention may include any polymer or 
oligomer of pyrimidine and purine bases, preferably cytosme, thymine, and uracil, and 
adenine and guanine, respectively. {See Albert L. Lehninger, Principles of 
Biochemistry, at 793-800 (Worth Pub. 1982) which is herein incorporated in its entirety 
for all purposes). Indeed, the present invention contemplates any deoxyribonucleotide, 
ribonucleotide or peptide nucleic acid component, and any chemical variants thereof, 
such as methylated, hydroxymethylated or glucosylated forms of these bases, and the 
like. The polymers or oligomers may be heterogeneous or homogeneous in 
composition, and may be isolated fix)m naturally occurring sources or may be 
artificially or synthetically produced. In addition, the nucleic acids may be DNA or 
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RNA, or a mixture thereof, and may exist permanently or transitionally in single- 
stranded or double-stranded form, including homoduplex, heteroduplex, and hybrid 
states. 

An "oligonucleotide" or '^polynucleotide" is a nucleic acid ranging from at least 

5 2, preferably at least 8, 15 or 20 nucleotides in length, but maybe up to 50, 100, 1000, 
or 5000 nucleotides long or a compound that specifically hybridizes to a polynucleotide. 
Polynucleotides of the present invention include sequences of deoxyribonucleic acid 
PNA) or ribonucleic acid OE^NA) or mimetics thereof which may be isolated from 
natural sources, recombinantly produced or artificially synthesized. A further example 

10 of a polynucleotide of the present invention may be a peptide nucleic acid (PNA). {See 
U.S. Patent No. 6,156,501 which is hereby mcoiporated by reference in its entirety.) 
The invention also encompasses situations in which there is a nontraditional base 
pairing such as Hoogsteen base pairing which has been identified in certain tRNA 
molecules and postulated to exist in a triple helix. *Tolynucleotide" and 

1 5 "oUgonucleotide" are used interchangeably in this application. 

The term "fragment," "segment," or 'T)NA segment" refers to a portion of a 
larger DNA polynucleotide or DNA. A polynucleotide, for example, can be broken up, 
or fragmented into^ a plurality of segments. Various methods of fragmenting nucleic 
acid are well known in the art. These methods may be, for example, eitha: chemical or 

20 physical in nature. Chemical fragmentation may include partial degradation with a 
DNase; partial depurination with acid; the use of restriction enzymes; intron-encoded 
endonucleases; DNA-based cleavage methods, such as triplex and hybrid formation 
methods, that rely on the specific hybridization of a nucleic acid segment to localize a 
cleavage agent to a specific location in the nucleic acid molecule; or other enzymes or 

25 compounds which cleave DNA at known or unknown locations (see, for example, 

USSN 09/358,664). Physical fragmentation methods may involve subjecting the DNA 
to a high shear rate. High shear rates may be produced, for example, by moving DNA 
through a chamber or channel with pits or spikes, or forcing the DNA sample through a 
restricted size flow passage, e.g., an aperture having a cross sectional dimension in the 

30 micron or submicron scale. Other physical methods include sonication and 
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nebulization. Combinations of physical and chemical fragmentation methods may 
likewise be employed such as fragmentation by heat and ion-mediated hydrolysis. See 
for example, Sambrook et al., "Molecular Cloning: A Laboratory Manual," 3"" Ed. Cold 
Spring Harbor Laboratory Press, Cold Spring Harbor, New York (2001) ("Sambrook et 

5 al.) which is incorporated herein by reference for all purposes. These methods can be 
optimized to digest a nucleic acid into fragments of a selected size range. Useful size 
ranges maybe from 100, 200, 400, 700 or 1000 to 500, 800, 1500, 2000, 4000 or 
10,000 base pairs. However, larger size ranges such as 4000, 10,000 or 20,000 to 
10,000, 20,000 or 500,000 base pairs may also be useftd. 

10 A number of methods disclosed herein require the use of restriction enzymes to 

fragment the nucleic acid sample. In general, a restriction enzyme recognizes a specific 
nucleotide sequence of four to eight nucleotides and cuts the DNA at a site within or a 
specific distance from the recognition sequence. For example, the restriction enzyme 
EcoRl recognizes the sequence GAATTC and will cut a DNA molecule between the G 

15 and the first A. The length of the recognition sequence is roughly proportional to the 
frequency of occurrence of the site in the genome. A simplistic theoretical estimate is 
that a six base pair recognition sequence will occur once in every 4096 (4^) base pairs 
while a four base pair recognition sequence will occur once every 256 (4^^) base pairs. 
In silico digestions of sequences from the Human Genome Project show that the actual 

20 occurrences may be more or less frequent, depending on the sequence of the restriction 
site. Because the restriction sites are rare, the appearance of shorter restriction 
fragments, for example ttiose less than 1000 base pairs, is much less frequent Ihan the 
appearance of longer fragments. Many different restriction enzymes are known and 
appropriate restriction enzymes can be selected for a desired result. (For a description 

25 of many restriction enzymes see^ New England BioLabs Catalog which is herein 
incorporated by reference in its entirety for all purposes). 

Type-IIs endonucleases are a class of endonuclease that, like other 
endonucleases, recognize specific sequences of nucleotide base pairs within a double 
stranded polynucleotide sequence. Upon recognizing that sequence, the endonuclease 

30 will cleave the polynucleotide sequence, generally leaving an overhang of one strand of 
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the sequence, or "sticky end." The Type-Hs endonucleases are unique because they 
generally do not require palindromic recogjoition sequences and they generally.cleave 
outside of their recognition sites. For example, the Type-Es endonuclease Earl 
recognizes and cleaves in the following maimer: 

5 >^ 

5'-C-T-C-T-T-C-N-N-N-N-N-3' (SEQIDNO:!) 
3'-G-A-G-A-A«G-n-n-n-n-n-5' (SEQ ID N0:2) 

t 

10 where the recognition sequence is -C-T-C-T-T-C-, N and n represent complementary, 
ambiguous base pairs and the arrows indicate the cleavage sites in each strand. As the 
example illustrates, the recognition sequence is non-palindromic, and the cleavage 
^ occurs outside of that recognition site. 

Type-Es endonucleases are generally commercially available and are well 

15 known in the art. Specific Type-Bs endonucleases which are usefiil in the present 
invention include, e.g., Bbvl, BceAL, BjuM, Earl Alwl BbsU BsdU BsmAl BsmBl 
Bspm, , Hgal SapU Sfdt^ BsniFl FoJcU and Plel. Other Type-Es endonucleases that 
may be useful in tiie present invention may be found, for example, in the New England 
Biolabs catalogue. In some embodiments Type-Bs enzymes that generate a recessed 3* 

20 end are particularly useful. 

"Adaptor sequences'* or "adaptors" are generally oligonucleotides of at least 5, 
10, or 15 bases and preferably no more than 50 or 60 bases in length; however, they 
may be even longer, up to 100 or 200 bases. Adaptor sequences maybe synthesized 
using any methods known to those of skill in the art For the purposes of this invention 

25 they may, as options, comprise primer binding sites, recognition sites for 

endonucleases, common sequences and promoters. The adaptor may be entirely or 
substantially double stranded. A double stranded adaptor may comprise two 
oligonucleotides tiiat are at least partially complementary. The adaptor may be 
phosphorylated or unphosphorylated on one or both strands. Adaptors may be more 

30 eflBciently ligated to fragments if they comprise a substantially double stranded region 
and a short single stranded region which is complementary to the single stranded region 
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created by digestion with a restriction enzyme. For example, when DNA is digested 
with the restriction enzyme EcdSl the resulting double stranded fragments are flanked 
at either end by the single stranded overhang 5'-AATT-3\ an adaptor that carries a 
single stranded overhang 5'-AATT-3' will hybridize to the fragment through 

5 complementarity between the overhanging regions. This "sticky end" hybridization of 
the adaptor to the fragment may faciUtate ligation of the adaptor to the fragment but 
blunt ended ligation is also possible. Blunt ends can be converted to sticky ends using 
the exonuclease activity of the Klenow fragment For example when DNA is digested 
with PvuJL the blunt ends can be converted to a two base pair overhang by incubating 

10 the fragments with Klenow in the presence of dTTP and dCTP. Overhangs may also be 
converted to blunt ends by filling in an overfiang or removing an overhang. 

Methods of ligation will be known to those of skill in the art and are described, 
for example in Sambrook et at (2001) and tihie New England BioLabs catalog both of 
which are incorporated herein by reference for all purposes. Methods include using T4 

1 5 DNA Ligase which catalyzes the formation of a phosphodiester bond between 

juxtaposed 5 ' phosphate and 3 ' hydroxyl termini in duplex DNA or RNA with blunt and 
sticky ends; Tag DNA Ligase which catalyzes the formation of a phosphodiester bond 
between juxtaposed 5 ' phosphate and 3 ' hydroxyl termini of two adjacent 
oUgonucleotides which are hybridized to a complementary target DNA; E.coli DNA 

20 ligase which catalyzes the formation of a phosphodiester bond between juxtaposed 5 ' - 
phosphate and 3' -hydroxyl termini in duplex DNA containing cohesive ends; and T4 
RNA ligase which catalyzes ligation of a 5 ' phosphoryl-terminated nucleic acid donor 
to a 3' hydroxyl-terminated nucleic acid acceptor through the formation of a 3' ->5' 
phosphodiester bond, substrates include single-stranded RNA and DNA as well as 

25 dinucleoside pyrophosphates; or any other methods described in the art. 

When a fragment has been digested on both ends with the same enzyme or two 
enzymes that leave the same overhang, the same adaptor may be ligated to both ends. 
Digestion with two or more enzymes can be used to selectively ligate separate adaptors 
to either end of a restriction fragment For example, if a fragment is the result of 

30 digestion with EcoRI at one end and BamHI at the other end, the overhangs will be 5*- 
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AATT-3' and 5*GATC-3', respectively. An ad^tor with an overhang of AATT will be 
preferentially ligated to one end while an adaptor with an overhang of GATC will be 
preferentially ligated to the second end. 

An adaptor may be ligated to one or both strands of the fragmented DNA. In 

5 some embodiments a double stranded adaptor is used but only one strand is ligated to 
the fragments. Ligation of one strand of an adaptor may be selectively blocked. Any 
known method to block ligation of one strand maybe employed. For example, one 
strand of the adaptor can be designed to introduce a gap of one or more nucleotides 
between the 5' end of that strand of the adaptor and the 3' end of the target nucleic acid. 

10 Adapters can be designed specifically to be ligated to the termini produced by 
restriction enzymes and to introduce g^s or nicks. For example, if the target is an 
EcdSl digested fragment an adq>ter with a 5' overhang of TTA could be ligated to the 
AATT overhang left by EcoRl to mtroduce a single nucleotide gap between the adaptor 
and the 3' end of the fragment. Phosphorylation and kinasing can also be used to 

15 selectively block ligation of the adaptor to the 3' end of the target molecule. Absence 
of a phosphate from the 5' end of an adaptor will block ligation of that 5' end to an 
available 3 'OH. For additional adaptor methods for selectively blocking ligation see 
U.S. Patent 6,197,557 and USSN 09/910,292 which are incorporated by reference 
herein in their entirety for all purposes. 

20 Adaptors may also incorporate modified nucleotiiles that modify the properties 

of the adaptor sequence. For example, phosphorothioate groups may be incorporated in 
one of the adaptor strands. A phosphorothioate group is a modified phosphate group 
with one of the oxygen atoms replaced by a sulfur atom. In a phosphorothioated oligo 
(often called an "S-Oligo"), some or all of the intemucleotide phosphate groups are 

25 replaced by phosphorothioate groups. The modified backbone of an S-Oligo is resistant 
to the action of most exonucleases and endonucleases. Phosphorofhioates may be 
incorporated between all residues of an adaptor strand, or at specified locations within a 
sequence. A useful option is to sulfurize only the last few residues at each end of the 
oligo. This results in an oligo that is resistant to exonucleases, but has a natural DNA 

30 center. 
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A genome is all the genetic material of an organism. In some instances, the term 
genome may refer to the chromosomal DNA. Genome may be multichromosomal such 
that the DNA is cellularly distributed among a plurality of individual chromosomes. 
For example, in human there are 22 pairs of chromosomes plus a gender associated XX 

5 or XY pair. DNA derived from the genetic material in the chromosomes of a particular 
organism is genomic DNA. The term genome may also refer to genetic materials from 
organisms that do not have chromosomal structure. In addition, the term genome may 
refer to mitochondria DNA. A genomic Ubrary is a collection of DNA fragments 
representing tihie whole or a portion of a genome. Frequently, a genomic library is a 

10 collection of clones made from a set of randomly generated, sometimes overl^ping 
DNA fragments representing tiie entire genome or a portion of the genome of an 
organism. 

The term "chromosome" refers to the heredity-bearing gene carrier of a living 
cell which is derived from chromatin and which comprises DNA and protein 

1 5 components (especially histones). The conventional intemationally recognized 

individual himian genome chromosome numbering system is employed herein. The 
size of an individual chromosome can vary from one type to another with a given multi- 
chromosomal genome and from one genome to another. In the case of the human 
genome, the entire DNA mass of a given chromosome is usually greater than about 

20 100,000,000 bp. For example, the size of the ratire human genome is about 3x10^ bp. 
The largest chromosome, chromosome no. 1, contains about 2.4 x 10^ bp while the 
smallest chromosome, chromosome no. 22, contains about 5.3 x 10^ bp. 

A "chromosomal region" is a portion of a chromosome. The actual physical size 
or extent of any individual chromosomal region can vary greatly. The term "region" is 

25 not necessarily definitive of a particular one or more genes because a region need not 
take into specific account the particular codmg segments (exons) of an individual gene. 

An allele refers to one specific form of a genetic sequence (such as a gene) 
within a cell, an individual or within a population, the specific form differing fi:om other 
forms of the same gene in the sequence of at least one, and frequently more than one, 

30 variant sites within the sequence of the gene. The sequences at these variant sites that 
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differ between different alleles are tenned "variances", "polymoiphisms", or 
"mutations". At each autosomal specific chromosomal location or "locus" an individual 
possesses two alleles, one inherited &om one parent and one from the other parent, for 
example one. from the mother and one from the father. An individual is "heterozygous" 

5 at a locus if it has two different alleles at that locus. An individual is 'liomozygous" at 
a locus if it has two identical alleles at that locus. 

The term genotyping refers to the determination of the genetic information an 
individual carries at one or more positions in the genome. For example, genotyping 
may comprise the deteraiination of which allele or dleles an individual carries for a 

1 0 single SNP or the determination of which allele or alleles an individual carries for a 
plurality of SNPs. For example, a particular nucleotide in a genome may be an A in 
some individuals and a C in other individuals. Those individuals who have an A at the 
position have the A allele and those who have a C have the C allele. In a diploid 
organism the individual will have two copies of the sequmce containing the 

1 5 polymorphic position so the mdividual may have an A allele and a C allele or 

alternatively two copies of the A allele or two copies of the C allele. Those individuals 
who have two copies of the C allele are homozygous for the C allele, those individuals 
who have two copies of the A allele are homozygous for the C allele, and those 
individuals who have one copy of each allele are heterozygous. The array may be 

20 designed to distinguish between each of these three possible outcomes. A polymorphic 
location may have two or more possible alleles and the array may be designed to 
distinguish between all possible combinations. 

Polymorphism refers to tihie occurrence of two or more genetically determined 
alternative sequences or alleles in a population. A polymorphic marker or site is the 

25 locus at which divergence occurs. Preferred markers have at least two alleles, each 
occurring at frequency of preferably greater than 1%, and more preferably greater than 
10% or 20% of a selected populatioiL A polymorphism may comprise one or more base 
changes, an insertion, a repeat, or a deletion. A polymorphic locus maybe as small as 
one base pair. Polymorphic markers include restriction fragment length 

30 polymorphisms, variable number of tandem repeats (VNTR's), hypervariable regions. 
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minisatellites, dinucleotide repeats, trinucleotide repeats, tetranucleotide repeats, simple 
sequence repeats, insertion elements such as Alu or small insertions or deletions, for 
example, deletions or insertions of 1-10 bases. The first identified allelic form is 
arbitrarily designated as the reference form and other allelic forms are desig^ated as 

5 alternative or variant alleles. The allelic form occurring most firequently in a selected 
population is sometimes referred to as the wild type form. Diploid organisms may be 
homozygous or heterozygous for allelic forms. When an organism carries two identical 
alleles the organism is homozygous at that position. When an organism carries two 
different alleles the organism is hetCTOzygous at that position. Normal cells that are 

10 heterozygous at one or more lod may give rise to tumor cells that are homozygous at 
those loci. This loss of heterozygosity may result firom structural deletion of normal 
genes or loss of the chromosome carrying flie normal gene, mitotic recombination 
between normal and mutant genes, followed by formation of daughter cells 
homozygous for deleted or inactivated (mutant) genes; or loss of the chromosome with 

15 the normal gene and duplication of the chromosome with the deleted or inactivated 
(mutant) gene. 

Single nucleotide polymorphisms (SNPs) are positions at which two alternative 
bases occur at appreciable frequency (>1%) in the human population, and are the most 
common type of human genetic variation. The site is usually preceded by and followed 
20 by highly conserved sequences of the allele (e.g., sequences that vary in less than 1/100 
or 1/1000 members of the populations). 

A single nucleotide polymorphism usually arises due to substitution of one 
nucleotide for another at the polymorphic site. A transition is the replacement of one 
purine by another purine or one pyrimidine by another pyrimidine, A transversion is 
25 the replacement of a purine by a pyrimidine or vice versa- Single nucleotide 
polymorphisms can also arise from a deletion of a nucleotide or an insertion of a 
nucleotide relative to a reference allele. 

A diallelic polymorphism has two forms in a population. A triallelic 
polymorphism has three forms. A polymorphism between two nucleic acids can occur 
30 naturally, or be caused by exposure to or contact with chemicals, enzymes, or other 
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agents, or exposure to agents that cause damage to nucleic acids, for example, 
ultraviolet radiation, mutagens or carcinogens. 

Linkage disequilibrium or allelic association means the preferential association 
of a particular allele or genetic marker with a specific allele, or genetic marker at a 

5 nearby chromosomal location more firequently than expected by chance for any 

particular allele frequency in tiie population. For example, if locus X has alleles a and 
b, which occur equally frequently, and linked locus Y has alleles c and d, which occur 
equally frequently, one would expect the combination ac to occur with a frequency of 
0.25. If ac occurs more frequently, then alleles a and c are in linkage disequilibrium. 

10 Linkage disequilibrium may result from natural selection of certain combination of 
alleles or because an allele has been introduced into a population too recently to have 
reached equilibrium with linked alleles. A marker in linkage disequilibrium can be 
particularly usefiil in detecting susceptibility to disease (or other phenotype) 
notwithstanding that the marker does not cause the disease. For example, a marker pQ 

1 5 that is not itself a causative element of a disease, but which is in linkage disequilibrium 
with a gene (including regulatory sequences) (Y) that is a causative element of a 
phenotype, can be detected to indicate susceptibility to the disease in circumstances in 
which the gene Y may not have been identified or may not be readily detectable. 

Capture probes are oligonucleotides that have a 5' common sequence and a 3' 

20 locus or target specific region or primer. The locus or target specific region is designed 
to hybridize near a region of nucleic acid that includes a region of interest so that the 
locus or target specific region of the capture probe can be used as a primer and be 
extended througih the region of interest to make a copy of the region of interest. The 
common sequence in the capture probe may be used as a prinung site in subsequent 

25 rounds of amplification using a common primer or a limited number of common 
primers. The same common sequence may be present in many or all or the capture 
probes in a collection of c^ture probes. Capture probes may also comprise other 
sequences, for example, tag sequences that are unique for different species of capture 
probes, and endonuclease recognition sites. 



wo 03/106642 



PCT/US03/18853 



-23 



A tag or tag sequence is a selected nucleic acid with a specified nucleic acid 
sequence. A tag probe has a region that is complementary to a selected tag. A set of 
tags or a collection of tags is a collection of specified nucleic acids that maybe of 
similar length and similar hybridization properties, for example similar Tm. The tags in 

5 a collection of tags bind to tag probes with minimal cross hybridization so that a single 
species of tag in the tag set accounts for the majority of tags which bind to a given tag 
probe species under hybridization conditions. For additional description of tags and tag 
probes and methods of selecting tags and tag probes see USSN 08/626^85 and 
EP/0799897, each of which is incorporated herrin by reference in ttieir entirety. 

10 A collection of capture probes may be designed to mterrogate a collection of 

target sequences. The collection would comprise at least one c^ture probe for each 
target sequence to be amplified. There may be multiple different capture probes for a 
single target sequence in a collection of capture probes, for example, there may be a 
capture probe that hybridizes to one strand of the target sequence and a c^ture probe 

15 that hybridizes to the opposite strand of the target sequence, these maybe referred to as 
a forward locus or target specific primer and a reverse locus or target specific primer. 
There also may be two or more capture probes that hybridize at different locations 
downstream of the target sequence. 

A collection of capture probes may be used to amplify a subset of a genome. 

20 The collection of capture probes may be initially used to generate a copy of the target 
sequences in the genomic sample and then the copies maybe amplified using conunon 
primers. The amplification may be done simultaneously in the same reaction and ofl:en 
in the same tube. 

The temi "target sequence", •target nucleic acid" or 'target" refers to a nucleic 
25 acid of interest. The target sequence may or niay not be of biological significance. As 
non-limiting examples, target sequences may include regions of genomic DNA which 
are believed to contam one or more polymorphic sites, DNA encoding or believed to 
encode genes or portions of genes of known or unknown function, DNA encoding or 
beUeved to encode proteins or portions of proteins of known or unknown function, and 
30 DNA encoding or beUeved to encode regulatory regions such as promoter sequences. 
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splicing signals, polyadenylation signals, etc. The number of sequences to be 
interrogated can vary, but preferably are from about 1000, 2,000, 5,000, 10,000, 20,000 
or 100,000 to 5000, 10,000, 100,000, 1,000,000 or 3,000,000 target sequences. 
An "array" comprises a support, preferably solid, with nucleic acid probes 
5 attached to the support. Preferred arrays typically comprise a plurality of different 
nucleic acid probes that are coupled to a surface of a substrate in different, known 
locations. These arrays, also described as "microarrays" or colloquially "chips" have 
been generally described in the art, for example, U.S. Pat. Nos. 5,143,854, 5,445,934, 
5,744,305, 5,677,195, 5,800,992, 6,040,193, 5,424,186 andFodor et al.. Science, 
10 251:767-777(1991). Each of which is incorporated by reference in its entirety for all 
purposes. 

Arrays may generally be produced using a variety of techniques, such as 
mechanical synthesis methods or light directed synthesis methods fliat incorporate a 
combination of photolifliographic methods and solid phase synthesis methods. 

15 Techniques for the synthesis of these arrays using mechanical synthesis methods are 
described in, e.g., U.S. Pat. Nos. 5,384,261, and 6,040,193, which are incorporated 
herein by reference in their entirety for all purposes. Although a planar array surface is 
preferred, the array may be fabricated on a surface of virtually any shape or even a 
multiplicity of surfaces. Arrays may be nucleic acids on beads, gels, polymeric surfaces, 

20 fibers such as fiber optics, glass or any other appropriate substrate. (See U.S. Patent 
Nos. 5,770,358, 5,789,162, 5,708,153, 6,040,193 and 5,800,992, which are hereby 
incorporated by reference in their entirety for all purposes.) 

Arrays may be packaged in such a manner as to allow for diagnostic use or can 
be an all-inclusive device; e.g., U.S. Patent Nos. 5,856,174 and 5,922,591 incorporated 

25 in their entirety by reference for all purposes. 

Preferred arrays are conomercially available from Afl^etrix under the brand 
name GoieChip® and are directed to a variety of purposes, including genotyping and 
gene egression monitoring for a variety of eukaryotic and prokaryotic species. (See 
Affymetrix Lie, Santa Clara and then: website at affymetrix.com.) 
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Hybridization probes are oligonucleotides capable of binding in a base-specific 
manner to a complementary strand of nucleic acid. Such probes include peptide nucleic 
acids, as described in Nielsen et al., Science 254, 1497-1500 (1991), and other nucleic 
acid analogs and nucleic acid mimetics. See US Patent Application No. 08/630,427- 
5 filed 4/3/96. 

The term ' livbridization " refers to the process in which two single-stranded 
polynucleotides bind non-covalently to form a double-stranded polynucleotide; triple- 
stranded hybridization is also theoretically possible. The resulting double-stranded 
polynucleotide is a •^hybrid." The hybrid may have double-stranded regions and single 
10 stranded regions. 

Hybridizations are usually performed under stringeat conditions, for example, at 
a salt concentration of no more than 1 M and a temperature of at least 25®C. For 
example, conditions of 5X SSPE (750 mM NaCl, 50 mM NaPhosphate, 5 mM EDTA, 
pH 7.4) and a temperature of 25-30°C are suitable for allele-specific probe 
15 hybridizations. For stringent conditions, see, for example, Sambrook et al., (2001) 
which is hereby incorporated by reference in its entirety for all purposes above. 

An individual is not limited to a human being, but may also include other 
organisms including but not limited to mammals, plants, bacteria or cells derived from 
any of the above. 

20 

(C.) Multiplexed Anchored Runoff Amplification 

Generally, the invention provides methods for highly multiplexed locus specific 
amplification of nucleic acids and methods for analysis of the amplified products. In 
25 some embodiments the invention combines the use of capture probes that comprise a 
common sequence and a locus-specific region with adapter-modified sample nucleic 
acid; the adapter comprises a second common sequence. The capture probes are 
extended to produce copies of the sample DNA that contain common priming sequences 
flanking the target sequence. The copies are amplified with a generic set of primers that 
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recognize fhe common sequences. The amplified product may be analyzed by 
hybridization to an airay of probes. 

In one embodiment the steps of the invention comprise: generating capture 
probes; digesting a nucleic acid sample; ligating adaptors to the fragmented sample; 
5 mixing the fragments and the capture probes under conditions that will allow 

hybridization of the fragments and the capture probes; extending the capture probes in 
the presence of dNTPs and polymerase; amplifying the extended capture probes; and 
detecting the presence or absence of target sequences of interest. 

One embodiment of the methods is illustrated in Figure 1. Capture probes are 

10 designed with a locus specific region (LSI? and LSIr) that hybridizes near a target 

sequence of interest and a common sequence (Al) that is S' of the locus specific region. 
The common priming site may be present in a plurality of capture probes so that a 
primer to Al maybe used for amplification of a plurality of different targets in 
subsequent steps. The capture probes are attached to a solid support so that they have a 

1 S free 3 ' end. A plurality of a single species of capture probes may be synthesized at a 
discreet location on an array and may form a discrete feature of an array. Each feature 
of the array may contain a different species of locus specific capture probe. 

Genomic DNA is fragmented and adapters comprising a second common 
sequence (A2) are ligated to the fragments. The adapter-ligated fragments are then 

20 mixed with the capture probes under conditions that allow hybridization of the 

fragments to the capture probes on the array. The capture probes are then extended 
using the adapter-ligated fragments as template. The extension product has a common 
sequence, Al, near its 5' end and a second common sequence A2 near its 3' end. These 
common sequences flank a region of interest. The capture probes are then released 

25 from the array and extended capture probes are amplified by PGR using primers to the 
common sequences Al and A2. The amplified product may then be analyzed by, for 
exanq)le, hybridization to an array. Information about the region of interest can be 
determined by analysis of the hybridization pattern. 

A second embodiment of the methods is illustrated in Figure 2. Capture probes 

30 are designed with a locus specific region (LSI or LS2) and a common sequence (Al) as 
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in figure 1 . In this embodiment the capture probes further comprise a tag sequence that 
is unique for each species of capture probe designed. (For a description of tags and tag 
probes, see, USSN 08/626,285.) The capture probes are attached to the array through 
hybridization of the tag sequence to a substantially complementary tag probe sequence 
that is attached to the array. The tag probes may be attached to the array in discrete 
locations. Different species of tag probes are present at different discrete, spatially 
addressable locations. Adapter-ligated genomic DNA is hybridized to the array so that 
the capture probes hybridize to target sequences in the sample. The capture probes are 
extended as in figure 1 to incorporate the target sequence and common sequence A2. 
The extended cq)ture probes are released and amplified using primers Al and A2. The 
amplified product may then be analyzed by, for example, hybridization to an array. 
Information about the region of interest can be determined by analysis of the 
hybridization pattern. The amplified sample may be analyzed by any method known in 
the art, for example, MALDI-TOF mass spec, capillary electrophoresis, OLA, dynamic 
allele specific hybridization (DASH) or TaqMan® (Applied Biosystems, Foster City, 
CA). For other methods of genotyping analyses see Syvanen, Nature Rev. Gen. 2:930- 
942 (2001) which is hereia incorporated by reference in its entirety. 

• In some embodiments the capture probes are attached to a solid support prior to 
hybridization and hybridization takes place while the capture probes are attached to the 
soUd support. In some embodiments the capture probes are synfliesized on a soUd 
support. Any suitable solid si5)port known in the art may be used, for example, arrays, 
beads, microparticles, microtitre dishes and gels may be used. In some embodiments the 
capture probes are synthesized on an array in a 5 ' to 3' directiorL 

hi some embodiments hybridization and extension of capture probes are done 
while the capture probes are attached to a solid support. Following extension of the 
capture probes nucleic acids that are not covalently attached to the solid support may be 
washed away. In some embodiments the extended capture probes are released fi:om the 
soUd support prior to amplificatioiL In another embodiment amplification takes place 
while the extended capture probes are attached to the soUd support. The extended 
capture probes may be released from the solid support by, for example, using a 
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reversible linker or an enzymatic release, such as an endonuclease or by a change in 
conditions that results in disruption of an interaction between the capture probe and the 
solid support, for example, when capture probes are associated with the solid support 
through base pairing between a tag in the capture probe and a tag probe on the solid 
5 support, disruption of the base pairing interaction releases the capture probes jfrom the 
soUd support. Enzymatic methods include, for example, use of uracil DNA glycosylase 
(UDG) or (UNG). UNG catalyzes the hydrolysis of DNA that contains deoxyuridine at 
the site the uridine is incorporated. Incorporation of one or more uridines in the capture 
probe followed by treatment with UNG will result in release of the capture probe from 

10 the solid support AthermolabileUNGmay alsobeused 

Iq some embodiments a collection of target sequences is analyzed. A plurality 
of capture probes is designed for a plurality of target sequences. In some embodiments 
target sequences contain or are predicted to contain a polymorphism, for example, a 
SNP. The polymorphism may be, for example, near a gene that is a candidate marker 

15 for a phenotype, useful for diagnosis or a disorder or for carrier screening or the 
polymorphism may define a haplotype block (see, Daly et al. Nat Genet 29:229-32 
(2001), and Rioux et al. Nat Genet, 29:223-8 (2001) and U.S. Patent application 
10/213,272, each of which is incorporated herein by reference in its entirety). A 
collection of capture probes may be designed so that capture probes hybridize near a 

20 polymoiphism, for example, witiun 1, 5, 10, or 100 to 5, 10, 100, 1000, 10,000 or 

100,000 bases from the polymorphism. The capture probes hybridize to one strand of 
the target sequence and can be extended through the polymorphic site or region so that 
the extension product comprises a copy of the polymorphic region. 

Many amplification methods are most efficient at amplification of smaller 

25 fragments. For example, PGR most efficientiy amplifies fragments that are smaller than 
2 kb (see, Saiki et al. 1988). In one embodiment capture probes and fragmentation 
conditions are selected for efficient amplification of a selected collection of target 
sequences. The size of the amplified fragments is dependent on where the target 
specific region of the capture probe hybridizes to the target sequence and the 5* end of 

30 the fragment strand that the capture probe is hybridized to. Iq some embodunents of the 
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present methods capture probes and fragmentation methods are designed so that the 
target sequence of interest can be amplified as a fragment that is, for example, less than 
20,000, 2,000, 800, 500, 400, 200 or 100 base pairs long. The capture probe can be 
designed so that the 3* end of the target specific region hybridizes to the base that is just 

5 3 ' of a position to be interrogated in the target sequence. For example, if the sequence 
to be interrogated is a polymorphism and ihe sequence is 5'-GCTXATCGG-3', where 
X is the polymorphic position, the target specific region of the capture probe may have 
the sequence 5'~CCGAT-3'. When the sample is firagmented with site specific 
restriction enzymes the length of the fi:agments will also depend on the position of the 

10 nearest recognition site for the enzyme or enzymes used for firagmentation. A collection 
of target sequences may be selected based on proximity to restriction sites. In some 
embodiments target sequences are selected for amplification and analysis based on the 
presence of a sequence of interest, such as a SNP, and proximity to a cleavage site for a 
selected restriction enzyme. For example, SNPs that are within 200, 500, 800, 1,000, 

15 1,500, 2,000 or 20,000 base pans of either a restriction site, such as, for example, an 
EcoBI site, a Bgll site, anXbal site or any other restriction enzyme site maybe selected 
to be target sequences in a collection of target sequences. In another method a 
fragmentation method that randomly cleaves the sample into firagments that are 30,100, 
200, 500 or 1,000 to 100, 200, 500, 1,000 or 2,500 base pairs on average may be used. 

20 In another embodiment, illustrated in Figure 3, the capture probes are in solution 

and hybridization and extension take place in solution. In this embodiment the nucleic 
acid sample is fitigmented and adapter containing common sequences A2 and A3 is 
ligated to the firagnients. In some embodiments one strand of the adapter, the strand that 
is ligated to the 3' end of the fi:agment strands lacks common sequence A2 and is 

25 blocked from extension at tiie 3' end. Ligation of the blocked ad^ter strand to the 3' 
end of the fi:agment strands prevents the fragments &om being extended to incorporate 
A2 at both ends, thus preventing amplification of the fragments by primer A2 in flie 
subsequent PGR amplification step. Capture probes with locus specific regions and 
common sequence Al are mixed with the adapter-ligated fragments under conditions 

30 that allow hybridization of the capture probes to the adapter Kgated firagments. The 
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capture probes are extended in the presence of polymerase and dNTPs. In some 
embodiments the extended capture probes are positively selected to generate a sample 
that is enriched for extended capture probes. In another embodiment extended capture 
probes are enriched by depleting non-extended products. 
5 In another embodiment the capture probes comprise a jBbrst common sequence, a 

tag sequence, a target sequences and a recognition sequence for a Type lis restriction 
enzyme (see, Figs. 4a and 4b, SEQ ID NOS: 4-12). The Type lis recognition site is 
inserted within the target specific region so that there is target specific sequence on 
either side of the Type lis recognition sequence and the tag sequence is 3' of the 

1 0 common sequence. In many embodiments there will be one or more mismatches 

between the probe and the target at the site of ttie Type lis site. In some embodiments 
the Type ns site is positioned so that when the fiagment is digested the enzyme cuts 
between tiie polymorphic position and the base just 5* of the polymorphic position. The 
nucleic acid sample is firagmented and.ligated to adapters comprising a second common 

IS sequence. The capture probes and adapter-ligated fragments are nodxed under 

conditions that allow hybridization and the capture probes are extended. The extended 
capture probes are then made double stranded using a primer that is complementary to 
the adapter. The double stranded extended capture probes are amplified using primers 
to the common sequence in the capture probe and the common sequence in the adaptor, 

20 To detect the allele or alleles present the amplified firagments are digested with a 

Type ns restriction endonuclease and the fi:agtnents (Fig. 4b) are extended in the 
presence of labeled ddNTPs. The firagments will be extended by a single ddNTP which 
corresponds to the allele presoat at the polymorphic position. The extended firagments 
are hybridized to an array of tag probes and the labeled nucleotide or nucleotides 

25 present at each location are determined. In one embodiment the ddNTPs are all labeled 
with tiie same label, for example, biotin and the firagments are extended in four separate 
reactions, one for each of the four different ddNTPs. Each reaction is hybridized to a 
different array so four arrays are used. In another embodiment the ddNTPs are labeled 
with differentially detectable labels. In one embodiment there are four different labels 

30 and the extension reaction may be done in a single reaction and the hybridization may 
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be to a single array. In another embodiment there are two different labels and extension 
reaction may be done in two reactions and the hybridization may be to two different 
arrays. 

In many embodiments of the present methods one or more enrichment step may 

5 be included to generate a sample that is enriched for extended capture probes prior to 
ampUfication with common sequence primers {see^ Figures 5-7). In some embodiments 
it is desirable to separate extended capture probes from fragments from the starting 
nucleic acid sample, ad^ter-ligated fragments, adapter sequences or non-extended 
C85)ture probes, for exan^)le. In one embodiment (Fig. 5) the capture probes are 

10 extended in the presence of a labeled dNTP, for example dNIPs labeled with biotm. 
The labeled nucleotides are incorporated into the extended capture probes and the 
labeled extended capture probes are then separated from non-extended material by 
aflBnity chromatography. When the label is biotin the labeled extended capture probes 
can be isolated based on the afiBnity of biotin for avidin, streptavidin or a monoclonal 

1 5 anti-biotin antibody. In one embodiment the antibody may be coupled to protein-A 
agarose, protein-A sepharose or any other suitable solid support known in the art. 
Those of skill in the art will appreciate that biotin is one label that may be used but any 
other suitable label or a combination of labels may also be used, such as fluorescein 
which may be incorporated in the extended c£q)ture probe and an anti-fluorescein 

20 antibody may be used for affinity purification of extended c^ture probes. Other labels 
such as, digoxigenin, Cyanine-3, Cyanine-5, Rhodamine, and Texas Red may also be 
used. Antibodies to these labeling compounds may be used for afBnity purification. 
Also, other haptens conjugated to dNTPs may be used, such as, for example, 
dinitrophenol (DNP). 

25 In another embodiment (Fig. 6) capture probes that have been extended through 

the adapter sequence (A2) on the adapter modified DNA are made double stranded by 
hybridizing and extending A2 primer. Only the fully extended capture probes will have 
the A2 priming site so partially extended capture probes will remam single-stranded. 
The sample is then digested with a nuclease that selectively digests single stranded 
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nucleic acid, such as E. Coli Exonuclease L The sample is then amplified with primers 
Al and A2. 

In another embodiment (Fig. 7) extension products may be enriched by 
circularization followed by digestion with a nuclease such as Exonuclease VII or 

5 Exonuclease IE. The extended capture probes may be circularized, for example, by 
hybridizing the ends of the extended capture probe to an oHgonucleotide spUnt so that 
the ends are juxt^osed and ligating the ends together. The splint will hybridize to the 
Al and A2 sequences in the extended captjie probe and bring the 5' end of the capture 
probe next to the 3' end of tiie capture probe so that the ends may be ligated by a ligase, 

10 for example DNA Ligase or Ampligase Thermostable DNA. See, for example, U.S. 
Patent No. 5,871,921 which is mcoiporated herein by reference. The circularized 
product will be resistant to nucleases fliat require either a free 5' or 3* end, 

A variety of nucleases may be used in one or more of the mibodiments. 
Nucleases that are commercially available and may be useful in the present methods 

15 include: Mung Bean Nuclease, E. Coli Exonuclease I, Exonuclease EI, Exonuclease 
Vn, T7 Exonuclease, BAL-31 Exonuclease, Lambda Exonuclease, RecJf, and 
Exonuclease T. Dijfferent nucleases have specificities for different types of nucleic 
acids maldng them usefiil for different applications. Exonuclease I catalyzes the 
removal of nucleotides from single-stranded DNA in the 3' to 5' direction. 

20 Exonuclease I degrades excess smgle-stranded primer oligonucleotide from a reaction 
mixture containing double-stranded extension products. Exonuclease III catalyzes the 
stepwise removal of mononucleotides from 3' -hydroxyl termini of duplex DNA. A 
limited number of nucleotides are removed during each binding event, resulting in 
coordinated progressive deletions withm the population of DNA molecules. The 

25 preferred substrates are blunt or recessed 3 ' -tOTiiini, although the enzyme also acts at 
nicks in daplox DNA to produce single-strand gaps. The enzyme is not active on single- 
stranded DNA, and thus 3' -protruding termini are resistant to cleavage. The degree of 
resistance depends on the length of the extension, with extensions 4 bases or longer 
being essentially resistant to cleavage. This property can be exploited to produce 

30 unidirectional deletions &om a linear molecule with one resistant (3 ' -overhang) and 
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one susceptible (blunt or 5' -overhang) tenninus. Exonuclease VII is a single-strand 
directed enzyme with 5' to 3 - and 3' to 5'-exonuclease activities making it the only bi- 
directional E, coli exonuclease with single-strand specij&city. The en2yme has no 
apparent requirement for divalent cation, and is ftilly active in the presence of EDTA. 

5 Initial reaction products are acid-insoluble ohgonucleotides which are further 

hydrolyzed into acid-soluble form. The products of limit digests are small oligomers 
(dimers to dodecamers). For additional information about nucleases see catalogues 
from manufacturers such as New England Biolabs, Beverly, MA. 

In some embodiments one of the primers added for PGR amplification is 

10 modified so that it is resistant to nuclease digestion, for example, by the inclusion of 
phosphorotbioate. Prior to hybridization to an array one strand of the double stranded 
fragments may be digested by a 5* to 3' exonuclease such as T7 Gene 6 Exonuclease. 

In some embodiments the nucleic acid sample, which may be, for example, 
genomic DNA, is fragmented, using for example, a restriction enzyme, DNase I or a 

15 non-specific fragmentation mefliod such as that disclosed in U.S. patent ^plication No. 
09/358,664, which is incorporated herein by reference in its entirety. Adapters 
containing at least one priming site are ligated to the fragmented DNA. Locus-specific 
primers are synthesized which contain a different adapter sequence at the 5' end. The 
adapter-ligated genomic DNA is hybridized to the locus-specific primers and the locus 

20 specific primer is extended. This may be done for example, by the addition of DNA 
polymerase and dNTPs. Extension products may be amplified with primers that are 
specific for the adapter sequmces. This allows amplification of a collection of many 
diflFermt sequences using a limited set of primers. For example, a single set of primers 
may be used for amplification. In another embodiment a second ampUfication step is 

25 carried out using the same or different primers. 

In some embodiments tilie amplified products are analyzed by hybridization to 
an array of probes attached to a solid siq)port. In some embodiments an array of probes 
is specifically designed to interrogate a collection of target sequences. The array of 
probes may interrogate, for example, from 1,000, 5,000, 10,000 or 100,000 to 2,000, 

30 5,000, 10,000, 100,000, 1,000,000 or 3,000,000 different target sequences. In one 
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embodiment the target sequences contain SNPs and the array of probes is designed to 
interrogate the allele or alleles present at one or more polymorphic location. The array 
may comprise a collection of probes that hybridize specifically to one or more SNP 
contaming sequences. The array may comprise probes that correspond to different 

5 alleles of the SNP. One probe or probe set may hybridize specifically to a first allele of 
a SNP, but not hybridize significantly to other alleles of the SNP and a second probe set 
may be designed to hybridize to a second allele of a SNP but not hybridize significantly 
to oflier alleles. A hybridization pattem &om the array indicates which of the alleles are 
present in the sample. An array may contain probe sets to interrogate, for example, 

10 from 1,000, 5,000, 10,000 or 100,000 to 2,000, 5,000, 10,000, 100,000, 1,000,000 or 
3,000,000 different SNPs. 

In another embodiment an array of probes that are complementary to tag 
sequences present m the captme probes is used to interrogate the target sequences. In 
some embodiments the amplified targets are analyzed on an array of tag sequences, for 

15 example, the Affymetrix GenFlex® array (Affymetrix, Inc., Santa Clara, CA). In this 
embodiment the capture probes comprise a tag sequence that is unique for each species 
of capture probe. A detectable label that is indicative of the allele present at the 
polymorphic site of interest is associated with the tag. The labeled tags are hybridized 
to the one or more arrays and the hybridization pattem is analyzed to determine which 

20 alleles are present. 

In another embodiment methods for generating a plurality of different 
oligonucleotides are disclosed. Oligonucleotides are synthesized in parallel on a sohd 
siQ)port. The oligonucleotides are then released from the solid support and used for 
furttier analysis. The released probes may be used, for example, for multiplex PGR 

25 amplification of a collection of target sequences, for probes, for primers for reverse 
transcription or amplification or for any other use of oligonucleotides known in the art. 
In one embodiment tiie oligonucleotides on the solid support comprise a collection of 
capture probes. 

In another embodiment kits that are usefiil for the present methods are disclosed. 
30 In one embodiment a kit for ampUfying a collection of target sequences is disclosed. 
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The kit may comprise one or more of the following: a collection of capture probes as 
disclosed, one or more adapter, one or more generic primers for common sequences, 
one or more restriction enzymes, buffer, one or more polymerase, a ligase, buffer, 
dNTPs, ddNTPs, and one or more nucleases. The restriction enzyme of the kit may be a 
5 type-Us enzyme. The capture probes may be attached to a solid support. 

METHODS OF USE 
The methods of the presently claimed invention can be used for a wide variety 
of applications. Any analysis of genomic DNA may be benefited by a reproducible 

1 0 method of complexity management. Furthermore, the methods and enriched firagments 
of the presently claimed invention are particularly well suited for study and 
characterization of extremely large regions of genondc DNA. 

In a preferred embodiment, the methods of the presently claimed invention are 
used for SNP discovery and to genotype individuals. For example, any of the 

1 5 procedures described above, alone or in combination, could be used to isolate the SNPs 
present in one or more specific regions of genomic DNA. Selection probes could be 
designed and manufactured to be used in combination with the methods of the invention 
to amplify only those jfragments containing regions of interest, for example a region 
known to contain a SNP. Arrays could be designed and manufactured on a large scale 

20 basis to interrogate only those firagments containing the regions of interest. Thereafl:er, 
a sample &om one or more individuals would be obtained and prepared using the same 
techniques which were used to prepare the selection probes or to design the array. Each 
sample can then be hybridized to an array and the hybridization pattem can be analyzed 
to determine tiie genotype of each individual or a population of individuals. Methods of 

25 use for polymorphisms and SNP discovery can be found in, for example, in US Patent 
No. 6,361,947 and co-pending US appUcation No. 08/813,159 which are herein 
incorporated by reference in their entirety for all purposes). 



30 
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Correlation of Polymorphisms with Phenotvpic Traits 

Most human sequence variation is attributable to or correlated with SNPs, with 
the rest attributable to insertions or deletions of one or more bases, repeat length 
polymorphisms and rearrangements. On average, SNPs occur every 1,000-2,000 bases 

5 when two human chromosomes are compared. {See, The Litemational SNP Map 
Working Group, Science 409: 928-933 (2001) incorporated herein by reference in its 
entirety for all purposes.) Human diversity is limited not only by the number of SNPs 
occurring in the genome but further by the observation that specific combinations of 
alleles are found at closely linked sites. 

10 Correlation of individual polymorphisms or groups of polymorphisms with 

phenotypic characteristics is a valuable tool in the effort to identify DNA variation that 
contributes to population variation in phenotypic traits. Phenotypic traits include 
physical characteristics, risk for disease, and response to the enviroimient. 
Polymorphisms that correlate with disease are particularly interesting because they 

15 represent mechanisms to accmrately diagnose disease and targets for drug treatment. 
Hxmdreds of human diseases have already been correlated with individual 
polymorphisms but there are many diseases that are known to have an, as yet 
unidentified, genetic component and many diseases for which a component is or may be 
genetic. 

20 Many diseases may correlate with multiple genetic changes making 

identification of the polymorphisms associated with a given disease more difficult. One 
approach to overcome this diflBculty is to systematically explore the limited set of 
common gene variants for association with disease. 

To identify correlation between one or more alleles and one or more phenotypic 

25 traits, individuals are tested for the presence or absence of polymorphic markers or 

marker sets and for the phenotypic trait or traits of interest The presence or absence of 
a set of polymorphisms is compared for individuals who exhibit a particular trait and 
individuals who exhibit lack of the particular trait to detenniae if the presence or 
absence of a particular allele is associated with the trait of interest. For example, it 

30 might be found that the presence of allele Al at polymorphism A correlates with heart 
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disease. As an example of a coirelation between a phenotypic trait and more than one 
polymorphism, it might be fomid that allele Al at polymorphism A and allele Bl at 
polymorphism B correlate with a phenotypic trait of interest. 

5 Diagnosis of Disease and Predisposition to Disease 

Markers or groups of markers that correlate with the symptoms or occurrence of 
disease can be used to diagnose disease or predisposition to disease without regard to 
phenotypic manifestation. To diagnose disease or predisposition to disease, individuals 
are tested for the presence or absmce of polymorphic markers or marker sets that 

10 correlate with one or more diseases. If, for example, the presence of allele Al at 

polymorphism A correlates with coronary artery disease then individuals with allele Al 
at polymorphism A may be at an increased risk for the condition. 

Individuals can be tested before symptoms of the disease develop. M&nts, for 
example, can be tested for genetic diseases such as phenylketonuria at birth. 

1 5 Individuals of any age could be tested to determine risk profiles for the occurrence of 
future disease. Often early diagnosis can lead to more effective treatment and 
prevention of disease through dietary, behavior or pharmaceutical interventions. 
Individuals can also be tested to determine carrier status for genetic disorders. Potential 
parents can use this information to make family planning decisions. 

20 Individuals who develop symptoms of disease that are consistent with more than 

one diagnosis can be tested to make a more accurate diagnosis. If, for example, 
symptom S is consistent with diseases X, Y or Z but allele Al at polymorphism A 
correlates with disease X but not with diseases Y or Z an individual with symptom S is 
tested for the presence or absence of allele Al at polymorphism A. Presence of alleje 

25 Al at polymorphism A is consistent with a diagnosis of disease X. Genetic expression 
information discovered through the use of arrays has been used to determine the 
specific type of cancer a particular patient has. {See^ Golub et al. Science 286: 531-537 
(2001) hereby incorporated by reference in its entirety for all purposes.) 



30 



wo 03/106642 



-38- 



PCTAJS03/18853 



PharmacQgenomics 

Phannacogenomics refers to the study of how genes affect response to drugs. 
There is great heterogeneity in the way individuals respond to medications, in terms of 
both host toxicity and treatment efficacy. There are many causes of this variability, 
including: severity of the disease being treated; drug interactions; and the individuals 
age and nutritional status. Despite the importance of these clinical variables, inherited 
diflFerences in the form of genetic polymorphisms can have an even greater influence on 
the efficacy and toxicity of medications. Genetic polymorphisms in drug-metabolizdng 
enzymes, transporters, receptors, and other drug targets have been linked to 
interindividual diflFerences in the efficacy and toxicity of many medications. (See, 
Evans andRelling, Science 286: 487-491 (2001) which is herein incorporated by 
reference for all purposes). 

An individual patient has an inherited abihty to metabolize, eliminate and 
respond to specific drugs. Correlation of polymorphisms with pharmacogenomic traits 
identifies those polymorphisms that impact drug toxicity and treatment efficacy. This 
information can be used by doctors to determine what course of medicine is best for a 
particular patient and by pharmaceutical companies to develop new drugs that target a 
particular disease or particular individuals within the population, while decreasing the 
Ukelihood of adverse affects. Drugs can be targeted to groups of individuals who carry 
a specific allele or gjroup of alleles. For example, individuals who carry allele Al at 
polymorphism A may respond best to medication X while individuals who carry allele 
A2 respond best to medication Y. A trait may be the result of a single polymorphism 
but will often be determined by the interplay of several genes. 

In addition some drugs that are highly effective for a large percentage of the 
population prove dangerous or even lethal for a very small percentage of flie population. 
These drugs typically are not available to anyone. Pharmacogenomics can be used to 
correlate a specific genotype with an adverse drug response. If pharmaceutical 
companies and physicians can accurately identify those patients who would suffer 
adverse responses to a particular drug, the drug can be made available on a limited basis 
to those who would benefit fi'om the drug. 
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Similarly, some medications may be highly effective for only a very small 
percentage of the population while proving only slightly effective or even ineffective to 
a large percentage of patients. Phaimacogenomics allows pharamaceutical companies 
to predict which patients would be the ideal candidate for a particular drug, thereby 
5 dramatically reducing failure rates and providing greater incentive to companies to 
continue to conduct research into those drugs. 

Determination of Relatedness 

There are many circumstances where relatedness between individuals is the 
10 subject of genotype analysis and the present invention can be applied to these 
procedures. 

Paternity testing is commonly used to establish a biological relationship between a child 
and the putative father of that child. Genetic material from the child can be analyzed 
for occurrence of polymorphisms and compared to a similar analysis of tihie putative 

15 father's genetic material. Determination of relatedness is not limited to the relationship 
between father and child but can also be done to determine the relatedness between 
mother and child, {see e.g. Staub et al., U.S. Pat. No. 6,187,540) or more broadly, to 
determine how related one individual is to another, for example, between races or 
species or between individuals from geographically separated populations, (see for 

20 example H. Kaessmann, et al. Nature Genet 22, 78 (1999)). 

Forensics 

The capacity to identify a distinguishing or unique set of forensic markers in an 
individual is useful for forensic analysis. For example, one can determine whether a 

25 blood sample from a suspect matches a blood or other tissue sample from a crime scene 
by determining whether the set of polymorphic forms occupying selected polymorphic 
sites is the same in the suspect and the sample. If the set of polymorphic markers does 
not match between a suspect and a sample, it can be concluded (barring experimental 
error) that the suspect was not the source of the sample. If the set of markers does 

30 match, one can conclude that the DNA from the suspect is consistent with that found at 
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the crime scene. If frequencies of the polymoiphic forms at the loci tested have been 
determined (e.g., by analysis of a suitable population of individuals), one can perform a 
statistical analysis to determine the probability that a match of suspect and crime scene 
sample would occur by chance. A similar comparison of markers can be used to 
5 identify an individual's remains. For example the U.S. armed forces collect and archive 
a tissue sample for each service member. If unidentified human remains are suspected 
to be those of an individual a sample from the remains can be analyzed for markers and 
compared to the markers present in the tissue sample initially collected from that 
individual. 

10 

Marker Assisted Breeding 

Genetic markers can assist breeders in the understanding, selecting and 
managing of the genetic conq)lexity of animals and plants. Agriculture industry, for 
example, has a great deal of incentive to try to produce crops with desirable traits Qiigb 

15 yield, disease resistance, taste, smell, color, texture, etc.) as consumer demand increases 
and expectations change. However, many traits, even when the molecular mechanisms 
are known, are too difficult or costly to monitor diuing production. Readily detectable 
polymorphisms which are in close physical proximity to the desired genes can be used 
as a proxy to determine whether the desired trait is present or not in a particular 

20 organism. This provides for an eflScient screening tool which can accelerate the 
selective breeding process. 

EXAMPLES 

Example 1. Multiplexed Anchored Runoff Amplification 

25 Genomic DNA was digested with Msel and ligated to an adapter containing T7 

promoter sequence as a priming site. The final concentration of the genomic DNA was 
10 ng/|il in IX T4 DNA Ligase Buffer. To generate extended capture probes 2.5 \il of 
adapter Ugated DNA, 2.5 |xl lOX Taq Gold Buffer, 2 ^il 25 mM MgC12, 2.5 ]xl lOX 
dNTPs, 5 ^il of a 500 nM mixture of 150 different capture probes in TE buffer 

30 corresponding to 150 different forward primers from the HuSNP assay, 0.25 jil Perfect 
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Match Enhancer, 0.25 |xl AmpliTaq Gold (Applied Biosystems, Foster City, CA) and 10 
^il of water were mixed to give a final reaction volume of 25 \xl. The reaction was 
incubated at 95^C for 6 min followed by 26 cycles of 95°C for 30 sec, 68°C for 2.5 min 
(decreasing 0.5°C on each subsequent cycle) and 72^C for 1 min, then to 4°C. 

5 The extended capture probes were made double stranded by the addition of 0.25 

^il of 1 pM T7 primer and incubation at 95°C for 2 min, 55*^0 for 2 min, 72°C for 6 
min, then to 4*'C. The reaction was passed over a G-25 Sephadex column and 5 pi of 
lOX Exonuclease I Buffer (NEB) and 2 \xl of Exonuclease I (NEB) were added and ttie 
reaction was incubated at 37°C for 60 min, 80°C for 20 min, then to 4°C. The products 

10 were purified over a Qiagen (Valencia, CA) mini-elute column and eluted with 10 |al 
EB Buffer. 

Generic PGR was done as follows: 65,5 \il water, 10 pi lOX Taq Gold Buffer, 8 
^125 mMMgC12, 10 pi lOXdNTPs, 1 pll pMT3 primer, 1 pi 1 pM T7 primer 3 pi 
DNA, 0,5 pi Perfect Match Enhancer and 1 jil Anq)liTaq Gold were mixed in a 100 \il 
15 final reaction volume and incubated at 95*C for 8 min, 40 cycles of 95°C for 30 sec, 
55''C for 1 min, and 72 °C for 1 min, then 72°C for 6 min followed and finally to 4**C. 

An aliquot of the reaction was analyzed on a 2% agarose gel The products were 
concentrated using Qiagen QIAquick columns and eluted with 10 pi EB Buffer. The 
products were fragmented, labeled and hybridized to an array under standard conditions 
20 and hybridization patterns were analyzed. 

Example 2.Multiplexed Anchored Runoff Amplification with Biotin Enrichment 

Prepare adaptor Ugated genomic DNA as above. To generate extoaded capture 
probes 2.5 pi of ad^ter Ugated DNA, 2.5 |Ld lOX Taq Gold Buffer, 2 pi 25 mM MgCl2, 
25 0.5pl50XacGT(6mMdATP,6mMdCrP, 10mMdGIP,10mMdTTP),5^1ofa 
500 nM mixture of 150 different capture probes in TE buffer corresponding to 150 
different forward primers from the HuSNP assay, 0.25 pi Perfect Match Enhancer, 0.25 
p,l Amplitaq Gold, 2 pi 1 mM Biotin-N6-dATP (Perkin Ehner, Boston, MA), 2 pi 1 mM 
Biotin-N4-dCTP (Perkin Ehner) and 8 pi of water were mixed to give a final reaction 
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volume of 25 \xl The reaction was incubated at 95°C for 6 min followed by 26 cycles 
of 95''C for 30 sec, 68°C for 2.5 min (decreasing 0.5°C on each subsequent cycle) and 
72''C for 1 min, then to 4°C. Pass reaction over G-25 Sephadex column to remove 
unincorporated biotin-dNTPs. 

5 Enrich for biotinylated extension products. Adjust the G-25 eluate to Ix PGR 

buflfer and 2 mM MgCla. Add 15 |xl monoclonal anti-biotin agarose (Clone BN-34, 
Sigma), fiicubate at room temperature for 30 min with gentle agitation. Spin down 
agarose resin for 3 min at 5,000 ipm. Aspirate away supernatant and wash agarose 
resin with 250 ^1 Ix PGR buffer with 2 mM MgCfe. Aliquot agarose resin into PGR 

10 tubes for generic PGR with T3 and T7 primers. 

Generic PGR was done as follows: 65,5 nl water, 10 ^1 lOX Taq Gold Buffer, 8 
(il 25 mM MgGl2, 1 0 |il 1 Ox dNTPs, 1 ^l 1 pM T3 primer, 1 ^il 1 ^iM T7 primer, 3 ^1 
DNA, 0,5 fil Perfect Match Enhancer and 1 \il AmpliTaq Gold were mixed in a 100 ^il 
jfinal reaction volume and incubated at 95°G for 8 min, 40 cycles of 95**G for 30 sec, 

15 55°G for 1 min, and 72 **G for 1 min, then 72°G for 6 min and finally to 4^G. 
An aliquot of flie reaction was analyzed on a 2% agarose gel. The products were 
concentrated using Qiagen QIAquick columns and eluted with 30 |xl EE Buffer. The 
products were fragmented with DNase I, labeled with biotin-ddATP using TdT, and 
hybridized to an array under standard conditions. Hybridization patterns were analyzed. 

20 

Example 3. Multiplexed Anchored Runoff Amplification with Exo III Enrichment. 

Prepare adaptor ligated genomic DNA as above. Kinase capture probes by 
incubating 12 fxl of a 1 50-plex stock of either forward or reverse HuSNP® primers with 
12.7 pi H20, 3 ltd lOx T4 polynucleotide kinase buffer, 0.3 ^1 lOOmM ATP, and 2 T4 
25 Polynucleotide Kinase. Incubate the reaction at 37*'G for 30 min. Adjust reaction 
volume to 50 \xl and pass reaction over G-25 coluixm to exchange buffer. 

To generate extended capture probes 5 p.1 of adapter ligated DNA, 5 ^1 lOX Taq 
Gold Buffer, 4 ^l 25 mM MgCh, 5 \il lOX dNTPs, 20 iil of the kinased mixture of 150 
different capture probes, 1 ^1 Perfect Match Enhancer, 0.5 jil AmpliTaq Gold and 9.5 \il 
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of water were mixed to give a final reaction volume of 50 ^il. The reaction was 
incubated at 95''C for 6 min followed by 26 cycles of 95^C for 30 sec, 68°C for 2.5 min 
(decreasing 0.5X on each subsequent cycle) and72°C for 1 min, then finally to 4°C. 
Pass the reaction over a G-25 column to exchange buffer. 
5 Convert the single strand extension products to single strand circles using sphnt 

oligonucleotides and Ampligase Thennostable DNA ligase (Epicaiter, Madison, WI). 
The sequence of the T3-T7 splint oligo is (SEQ ID NO: 3) 
5 TCTCCCTTTAGTGAGGGTTAATTTGTAATACGACTCACTATAGGGCA-3 
Mix 39.75 pi water, 7.5 pi lOx AmpUgase Buflfer, 1.25 |il 70 tiM splint oligo, 25 |li1 5' 

10 phosphorylated single strand extension products and 1 .5 )il Ampligase Thennostable 
DNA Ligase 5 U/^iL Incubate the mixture at 95°C for 3 min, then 10 cycles of 95^C for 
30 sec and 72°C for 3 min, then 10 cycles of 95''C for 30 sec and 70''C for 3 min, then 
10 cycles of 95*^0 for 30 sec and 68*^0 for 3 min, then 10 cycles of 95^C for 30 sec and 
66^*0 for 3 min, then 10 cycles of 95°C for 30 sec and 64**C for 3 min, then 10 cycles of 

15 95**C for 30 sec and 62°C for 3 min. Hold at 4*'C? Pass reaction over G-25 column to 
exchange buffer. 

Digest uncircularized nucleic acids. Mix 13 pi water, 10 pi lOx Bxo m Buffer, 
75 pi AmpUgase/splint reaction and 2 |Ld Exonuclease EI 100 U/pl (NEB, Beverly, 
MA), hicubate at 37°C for 1 hour. Heat inactivate at 70*^0 for 20 min. Fragment, label 
20 and hybridize as above. 

CONCLUSION 

From the foregoing it can be seen that the present invention provides a flexible 
and scalable method for analyzing complex samples of DNA, such as genomic DNA. 
25 These methods are not limited to any particular type of nucleic acid sample: plant, 

bacterial, animal (including human) total graome DNA, RNA, cDNA and the like may 
be analyzed using some or all of the methods disclosed in this invention. This invention 
provides a powerfiil tool for analysis of complex nucleic acid samples. From 
experiment design to isolation of desired fragments and hybridization to an appropriate 
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array, the above invention provides for fast, eflScient and inexpensive methods of 
complex nucleic acid analysis. 

All publications and patent applications cited above are incorporated by 
reference in their enturety for all purposes to the same extent as if each individual 
5 publication or patent application were specifically and individually indicated to be so 
incorporated by reference. Although the present invention has been described in some 
detail by way of illustration and example for purposes of clarity and understanding, it 
will be apparent fliat c&rtsin changes and modifications may be practiced within the 
scope of the expended claims. 
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CLAIMS 

What is claimed is: 

1. A method of amplifying a collection of target sequences from a nucleic acid 
sample comprising: 

5 generating a collection of capture probes comprising a plurality of different 

species of primers wherein each species comprises a first common sequence and a 3* 
variable region that is specific for a target sequence in said collection of target 
sequences, wherein said collection of capture probes is attached to a solid support and 
the 3* end of the capture probes is available for extension; 

1 0 fragmenting the nucleic acid sample; 

ligating an adapter to the fragments, said adapter comprising a second common 
sequence; 

hybridizing tfie adapter-ligated fragments to Ihe collection of capture probes; 
extending the capture probes; and 
1 5 amplifying the extended capture probes with first and second common sequence 

primers. 

2. The method of claim 1 fiirflier comprising releasing the extended capture probes 
from the solid support prior to amplification. 

20 

3 . The method of claim 2 wherein prior to releasing the extended capture probes 
from the soUd support nucleic acids that are not covalently attached to the solid support 
are removed. 

25 4. The method of claim 1 wherein said capture probes are attached to the solid 
support through a covalent interaction. 

5 . The method of claim 1 wherein the capture probes fiirther comprise a tag 
sequence that is imique for each species of capture probe and the capture probes are 
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attached to the solid support by hybridization to a collection of tag probes that are 
covaiently attached to the sohd support. 

6. The method of claim 1 wherein each species of capture probe is attached to the 
S solid support in a discrete location. 

7. The method of claim 1 wherein prior to amplification the extended capture 
probes are enriched in the sample to be amplified. 

10 8. The method of claim 1 wherein labeled nucleotides are incorporated into Ihe 
extended capture probes and ext^ded cq)ture probes are isolated by aflSnity 
chromatography. 

9. The method of claim 8 wherein said labeled nucleotides are labeled witibi biotin 
15 and avidin, streptavidin or an anti-biotin antibody is used to isolate extended capture 

probes. 

10. The method of claim 1 wherein prior to amplification the extended capture 
probes are made double stranded and single stranded nucleic acid in the sample is 

20 digested. 

1 1 . The mettiod of claim 10 wherein the siugle stranded nucleic acid in the sample is 
digested with a nuclease. 

25 12. The method of claim 1 1 wherein tiie nuclease is Exonuclease I. 



13. The method of claim 1 wherein prior to amplification the extended capture 
probes are circularized and uncircularized nucleic acid in the sample is digested. 
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14. The method of claim 13 wherein extended c^ture probes are circularized by a 
method comprising: 

hybridizing an oligonucleotide splint to the extended capture probes, wherein 
the oligonucleotide splint is complementary to the first and second common sequences, 
5 thereby juxtaposing the 5 ' and 3 * ends of extended capture probes; and 

ligating the ends of flie extended capture probes to form circular extended 
capture probes. 

15. The method of claim 13 wherein the uncircularized nucleic acid remaining in the 
1 0 sample is digested with a nuclease. 

16. The method of claim 15 wherein the nuclease is Exonuclease HI. 

17. The method of claim 1 wherein the nucleic acid sample is fragmented by 
1 5 digestion with one or more restriction enzymes. 

1 8. The method of claim 1 wherein said capture probes are synthesized on a solid 
support. 

20 19. The method of claim 1 wherein there are 100 to 1500 different target sequences 
in the collection of target sequraces. 

20. The method of claim 1 wherein there are 1,000 to 5,000 different target 
sequences in the collection of target sequences. 

25 

21. The method of claim 1 wherein there are 2,000 to 10,000 different target 
sequences in the collection of target sequences. 

22. The method of claim 1 wherein there are 10,000 to 1,000,000 different target 
30 sequences in the collection of target sequences. 
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23 . A method of analyzing a nucleic acid sample comprising: 
amplifying a collection of target sequences jfrom said nucleic acid sample 

according to the method of claim 1; 
5 hybridizing the amplified collection of target sequences to an array; and 

analyzing the hybridization pattern to detect the presence or absence of target 
sequences from the collection of target sequences. 

24. A method of genotyping one or more polymorphic locations in a sample 
10 comprising: 

amplifying a collection of target sequences firom the sample according to the 
method of claim 1; 

hybridizing the amplified collection of target sequences to an array designed to 
interrogate at least one polymorphic location in the collection of target sequences; and 
15 analyzing the hybridization pattern to determine the identify of the allele or 

alleles present at one or more polymorphic location in the collection of target 
sequences. 

25. A method for analyzing sequence variations in a population of individuals 
20 comprising; 

obtaining a nucleic acid sample firom each individual; 

amplifying a collection of target sequences from each nucleic acid sample 
according to tiie method of claim 1; 

hybridizing each amplified collection of target sequences to an array designed to 
25 interrogate sequence variation in the collection of target sequences to generate a 
hybridization pattem for each sample; and 

analyzing the hybridization patterns to determine the presence or absence of 
sequence variation in the population of individuals. 
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26. A method of amplifying a collection of target sequences from a nucleic acid 
sample said method comprising: 

generating a collection of capture probes comprising a plurality of different 
species of primers wh^ein each species comprises a first common sequence and a 3' 
5 variable region that is specific for a target sequence in the collection of target 
sequences; 

fr^menting the nucleic acid sample; 

ligating an adapter to the fi:agments, wherein the adapter is ligated to the 
fragments so that the strand that is ligated to the 5' end of the fragment strands 
1 0 comprises a second conmion sequence and the strand that is ligated to the 3 ' end of the 
fragments lacks the complement of the second common sequence and is blocked from 
extension at the 3' end; 

hybridizing the adapter-ligated fragments to the collection of capture probes; 
extending the capture probes; and 
1 5 amplifying the extended capture probes with first and second common sequence 

primers. 

27. The method of claim 26 wherein an amino group is used to block extension at 
the 3' end of the adapter strand that is ligated to the 3' end of the firagments. 

20 

28. A method of analyzing a nucleic acid sample comprising: 
amplifying a collection of target sequences fcom said nucleic acid sample 

according to the method of claim 26; 

hybridizing the amplified collection of target sequences to an array; and 
25 analyzing the hybridization pattern to detect the presence or absence of target 

. sequences from the collection of target sequences. 



29. A method of genotyping one or more polymorphic locations in a sample 
comprising: 
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so- 
preparing an amplified collection of target sequences &om the sample according 
to the method of claim 26; 

hybridizing the amplified collection of target sequences to an array designed to 
interrogate at least one polymorphic location in the collection of target sequences; and 
5 analyzing the hybridization pattern to determine the identify of the allele or 

alleles present at one or more polymorphic location in the collection of target 
sequences. 

30. A method for analyzing sequence variations in a population of individuals 
10 comprising; 

obtaining a nucleic acid sample from each individual; 

amplifying a collection of target sequences from each nucleic acid sample 
according to the method of claim 26; 

hybridizing each amplified collection of target sequences to an array designed to 
15 interrogate sequence variation in the collection of target sequences to generate a 
hybridization pattern for each sample; and 

analyzing the hybridization pattems to determine the presence or absence of 
sequence variation in the population of individuals. 

20 31. The method of claim 26 whorein the nucleic acid sample is fragmented by 
digestion with one or more restriction enzymes. 

32. The method of claim 26 wherein prior to amplification the extension products 
are enriched in the sample to be amplified. 

25 

33. The method of claim 26 wherein labeled nucleotides are incorporated into the 
extension products and the extension products are enriched by affinity chromatography. 

34. The method of claim 33 wherein said labeled nucleotides are labeled with biotin 
30 and avidin, streptavidin or an anti-biotin antibody is used to isolate extension products. 
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3 5 . The method of claim 26 wherein prior to amplification the extended c^ture 
probes are made double stranded and single stranded nucleic acid in the sample is 
digested. 

5 

36. The method of claun 35 wherein the single stranded nucleic acid in the sample is 
digested with a nuclease. 

37. The method of claim 36 wherein the nuclease is Exonuclease 1. 

10 

38. The mefliod of claim 26 wherein prior to anq)lification the extended capture 
probes are circularized and uncircularized nucleic acid in the sample is digested. 

39. The mefliod of claim 38 wherein extended capture probes are circularized by a 

1 5 method comprising: 

hybridizmg an oligonucleotide splint to the extended capture probes, wherein 
the splint is complementary to the first and second common sequences, thereby 
juxtaposing the 5' and 3' ends of extended capture probes; and 

Ugating the ends of the extended capture probes to form circular extended 
20 capture probes. 

40. The method of claim 38 wherein the uncircularized nucleic acid remaining in the 
sample is digested with a nuclease. 

25 41 . The method of claim 40 wherein the nuclease is Exonuclease IH. 



42. The method of claim 26 wherein there are 100 to 1,500 different target 
sequences in the collection of target sequences. 
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43. The method of claim 26 wherein there are 1,000 to 5,000 dififerent target 
sequences in the collection of target sequences. 

44. The method of claim 26 wherein there are 2,000 to 10,000 different target 
5 sequences in the collection of target sequences, 

45. The method of claim 26 wherein there are 10,000 to 1,000,000 dififerent target 
sequences in the collection of target sequences. 

10 46. A method for genotyping one or more polymorphisms in a nucleic acid sample 
comprising: 

fragmenting the nucleic acid san^le to generate fragments; 
ligating an adaptor to the fragments said adapter comprising a first common 
sequence; 

1 5 hybridizing a collection of capture probes to the fragments, wherein said capture 

probes comprise a second common sequence, a tag sequence unique for each species of 
capture probe, a first target specific sequence, a Type Us restriction enzyme recognition 
sequence, and a second target specific sequence wherein the Type lis restriction enzyme 
recognition sequence is positioned so that the enzyme will cut on the 5* side of the 
20 polymorphic base; 

extending said capture probes to generate extended capture probes; 
amplifying the extended capture probes with first and second conamon sequence 
primers; 

digestmg the amplified product with a Type lis restriction enzyme to generate 
25 amplified product fragments; 

. extending the amplified product fragments in at least one extension reaction; 
hybridizing each extension reaction to an array comprising tag probes that 
hybridize to the tag sequences in the capture probes; and 

analyzing the hybridization pattern on each of the arrays to determine at least 
30 one genotype. 
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47. The method of claim 46 wherein in each extension reaction there is at least one 
species of labeled ddNTP. 

5 48. The method of claim 47 wherein one or more species of ddNTPs is labeled with 
biotin. 

49. The method of claim 47 wherein there are four separate extension reactions 
wherein each extension reaction corresponds to a different species of labeled ddNTP 

10 and each extension reaction is hybridized to a different array, 

50. The method of claim 47 wherein there are two separate extension reactions 
wherein two differentially labeled ddNTPs are present in each extension reaction and 
each extension reaction is hybridized to a different array. 

15 

51 . The method of claim 47 wherein there is one extension reaction wherein four 
differentially labeled ddNTPs are present in the extension reaction and the extension 
reaction is hybridized to a single array. 

20 52. The method of claim 46 wherein said capture probes are attached to a solid 
support. 

53. The method of claim 52 wherem said capture probes are attached to the solid 
siq>port through a covalent interaction. 

25 

54. The method of claim 52 wherein said capture probes are attached to said solid 
support by hybridization to a collection of tag probes that are attached to said solid 
support. 
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55. The method of claim 52 wherein each species of capture prohe is attached to 
said soUd support in a discrete location. 

56. The method of claim 52 wherein said capture probes are synthesized on a soUd 
5 support in a 5* to 3' direction. 

57. The method of claim 46 wherein one of the conmion sequence primers is 
resistant to nuclease digestion and each extension reaction is digested with a 5' to 3' 
nuclease activity prior to hybridization to an array. 

10 

58. The method of claim 57 wherein the nuclease resistant common sequence 
primer comprises phosphorothioate linkages. 

59. The method of claim 57 wherein said nuclease is T7 Gene 6 Exonuclease. 

15 

60. The method of claim 46 wherein prior to ampUfication the extended capture 
probes are enriched in the sample to be amplified. 

20 61 . The method of claim 46 wherein labeled nucleotides are incorporated into the 
extended capture probes and extended capture probes are isolated by affinity 
chromatography. 

62. The method of claim 61 wherein said labeled nucleotides are labeled with biotin 
25 and avidin, streptavidin or an anti-biotin antibody is used to isolate extended capture 

probes. 

63 . The method of claim 46 wherein prior to amplification the extended capture 
probes are made double stranded and single stranded nucleic acid in the sample is 

30 digested. 
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64. The method of claim 63 wherein the single stranded nucleic acid in the sample is 
digested with a nuclease. 

5 65. The method of claim 64 wherein the nuclease is Exonuclease L 

66. The method of claim 46 wherein prior to amplification the extended capture 
probes are circularized and uncircularized nucleic acid in the sample is digested. 

10 67. The method of claim 66 wherein extended capture probes are circularized by a 

method comprising: 

hybridizing an oligonucleotide splint to the extended capture probes, wherein 

the oligonucleotide splint is complementary to the first and second common sequences, 

thereby juxtaposing the 5* and 3' ends of extended capture probes; and 
1 5 ligating the ends of the extended capture probes to form circular extended 

capture probes. 

V 

68. The method of claim 66 wherein the uncircularized nucleic acid remaining in the 
sample is digested with a nuclease, 

20 

69. The method of claim 68 wherein the nuclease is Exonuclease IIL 

70. The method of claim 46 wherem the nucleic acid sample is firagmented by 
digestion with one or more restriction enzymes. 

25 

71. The method of claim 46 wherein there are 100 to 1500 diflFerent target sequences 
in the collection of target sequences. 



30 



72. The method of claim 46 wherein there are 1,000 to 5,000 different target 
sequences in the collection of target sequences. 
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73. The method of claim 46 wherem there are 2,000 to 10,000 different target 
sequences in the collection of target sequences. 

5 74. The method of claim 46 wherein there are 10,000 to 1,000,000 different target 
sequences in the collection of target sequences. 

75. A method for screening for sequaice variations in a population of individuals 
comprising: 

10 providing a nucleic acid sample from each individual; 

genotyping each sample according to the method of claim 46; and 
comparing the genotypes from the samples to determine the presence or absence 
of sequence variation in the population of individuals. 

15 76. A kit for amplifying a collection of target sequences said kit comprising: 

a collection of capture probes, wherein each species of capture probe comprises 
a first common sequence and a 3' variable region that is specific for a target sequence in 
said collection of target sequences; 

an adapter comprising a second common sequence; and 
20 a pair of first and second common sequence primers. ^ 

77. The kit of claim 76 wherein said collection of capture probes is covalently 
attached to a solid support and the 3 ' end of the capture probes is available for 
extension. 

25 

78. The kit of claim 76 further comprismg a restriction enzyme, buffer, DNA 
polymerase and dNTPs. 

79. A kit for amplifying a collection of target sequences said kit comprising: 
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a collection of capture probes, wherein each species of capture probe comprises 
a first common sequence, a tag sequence unique for each species of capture probe, a 
first target specific sequence, a Type lis restriction enzyme recognition sequence, and a 
second target specific sequence; 
5 an adapter comprising a first strand comprising a second common sequence and 

a second strand that does not contain the complement of the second common sequence 
and is blocked from extension at the 3' end; and 

a pair of first and second common sequence primers. 

10 80. The kit of claim 79 fijrfher comprising a Type lis restriction enzyme, a ligase, 
dNTPs, ddNTPs, buffer and DNA polymerase. 

81. The kit of claim 79 wherem one of the common sequence primers is resistant to 
nuclease digestion. 

15 

82. A collection of capture probes attached to a solid support, wherein said solid 
support is selected firom the group consisting of airays, beads, microparticles, microtitre 
dishes and gels. 

20 83 . A collection of capture probes, wherein each species of capture probe comprises 
a first common sequence, a tag sequence unique for each species of capture probe, a 
first target specific sequence, a Type Us restriction enzyme recognition sequrace, and a 
second target specific sequence, wherein the target specific sequence are specific for 
targets in a collection of target sequences. 

25 

84, A method of generatmg a plurality of oligonucleotides comprising: synthesizing 
a plurality of oligonucleotides on a solid support and releasing the plurality of 
oligonucleotides firom said solid support. 
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85. The method of claim 84 wherein said plurality of oligonucleotides comprises a 
collection of capture probes. 
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SEQUEDTCE LISTING 

<110> Affymetrix, Inc. 
Jones y Keith W. 
Shapero, Michael 
Liu, Weiwei 

<120> COMPLEXITY MANAGEMENT OF GENOMIC DNA BY 
LOCUS SPECIFIC AMPLIFICATION 



<130> 2719.2029001 

<150> US 10/272,155 
<151> 2002-10-14 

<150> US 60/389,747 
<151> 2002-06-17 

<160> 12 

<170> FastSEQ for Windows Version 4.0 

<210> 1 
<211> 11 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Type-II endonuclease Earl recognition sequence 

<22l> misc_feature 
<222> 7, 8, 9, 10, 11 
<223> n « A,T,C or G 

<400> 1 
ctcttcnnnn n 

<210> 2 
<211> 11 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Type -II endonuclease Earl recognition sequence 

<221> mis cofeature 
<222> 7, 8, 9, 10, 11 
<223> n = A,T,C or G 

<400> 2 
nnnnngaaga g 

<210> 3 
<211> 47 
<212> DNA 

<213> Artificial Sec[uence 
<220> 

<223> T3-T7 splint oligonucleotide 
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<400> 3 

tctcccttta gtgagggtta atttgtaata cgactcacta tagggca 



47 



<210> 4 
<211> 30 
<212> DNA 

<213> Artificial Sequence 

<220> 

<223> Oligonucleotide 
<400> 4 

aagattctaa taacctcgca gcgtgaaaac 30 

<210> 5 
<211> 37 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Oligonucleotide 



<210> 6 
<211> 48 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Oligonucleotide 
<400> 6 

aagattctaa taacctcgca gcgtgaaaac kaacatgcct caaaaaag 48 

<210> 7 
<211> 48 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Oligonucleotide 



<210> 8 
<211> 79 
<212> DNA 

<213> Artificial Secpience 
<220> 

<223> Oligonucleotide 

<221> misc_feature 
<222> 75, 76, 77, 78, 79 
<223> n = A,T,C or G 

<400> 8 

aattaaccct cactaaaggg agacgttcct aaagctgagt ctgaagattc taataacctc 60 
gcagcgtgaa aactannnn 79 



<400> 5 

cttttttgag gcatgtttngt tttcacctta agaggtt 



37 



<400> 7 

cttttttgag gcatgttmgt tttcacgctg cgaggttatt agaatctt 



48 
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<2X0> 9 
<211> 79 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Oligonucleotide 

<221> misc_feature 
<222> 1, 2, 3, 4, 5 
<223> n = A,T,C or G 

<400> 9 

nnnnnmgttt tcacgctgcg aggttattag aatcttcaga ctcagcttta ggaacgtctc 60 
cctttagtga gggttaatt 79 

<210> 10 
<211> 73 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Oligonucleotide 
<400> 10 

aattaaccct cactaaaggg agacgttcct aaagctgagt ctgaagattc taataacctc 60 
gcagcgtgaa aac 73 

<210> 11 
<211> 77 

<212> DNA , . 

<213> Artificial Sec[uence 

<220> 

<223> Oligonucleotide 

<221> mis cofeature 

<222> 1, 2, 3 

<223> n = A,T,C or G 

<400> 11 

nnnmgttttc acgctgcgag gttattagaa tcttcagact cagctttagg aacgtctccc 60 
tttagtgagg gttaatt 77 

<210> 12 
<211> 74 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Oligonucleotide 
<400> 12 

aattaaccct cactaaaggg agacgttcct aaagctgagt ctgaagattc taataacctc 60 
gcagcgtgaa aack 74 



(12) INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) 
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(43) International Publication Date (10) International Publication Number 
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(51) International Patent Clas^cation^: C12Q 1/68. 
C12P 19/34, C07H 21/04 

(21) International Application Number: 
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(22) International Filing Date: 16 June 2003 (16.06.2003) 

(25) Filing Language: English 

(26) Publication Language: English 



(30) Priority Data: 
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(54) Title: COMPLEXITY MANAGEMENT OF GENOMIC DNA BY LOCUS SPECIHC AMPLICATION 



^ (57) Abstract: The present invention provides for novel methods and kits for reducing the complexity of a nucleic acid sample 
O intenogate a collection of taiget sequences. In one embodiment complexity reduction can be accomplished by extension of a 
O locus specific capture probe followed by amplification of the extended capture probe using common primers. The locus specific 
^ capture probes may be attached to a solid support. Multiple DNA sequences may be amplified simultaneously to produce a reduced 
complexity sample. The invention further provides for analysis of the above sample to interrogate sequences of interest such as 
^ polymorphisms. The amplified sample may be hybridized to an array, which may be specifically designed to interrogate the desired 
1^ fragments for the presence or absence of a polymoiphism. 
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